October 1984 
AI&d Vol. 63 No. 8 Part 1 


BELL LABORATORIES 


TECHNICAL 
JOURNAL 


A JOURNAL OF THE AT&l COMPANIES 


QAM Binary Signaling 
Digital Radio 

Speech Processing 
Packet/Circuit Switching — 


Traffic 


EDITORIAL COMMITTEE 


A. A. PENZIAS,’ Committee Chairman 


M. M. BUCHNER, JR.! R. C. FLETCHER! J. S. NOWAK! 

R. P. CLAGETT2 D. HIRSCH* B. B. OLIVER? 

R. P. CREAN? S. HORING' J. W. TIMKO? 

B. R. DARNALL! R. A. KELLEY! Vv. A. VYSSOTSKY! 
B. P. DONOHUE, III? J. F. MARTIN? 


‘AT&T Bell Laboratories ?7AT&T Technologies AT&T Information Systems 


4 AT&T Consumer Products 5 AT&T Communications 


EDITORIAL STAFF 
B. G. KING, Editor L. S. GOLLER, Assistant Editor 
P. WHEELER, Managing Editor A. M. SHARTS, Assistant Editor 
B. G. GRUBER, Circulation 


AT&T BELL LABORATORIES TECHNICAL JOURNAL (ISSNO005-8580) is published ten times 
each year by AT&T, 550 Madison Avenue, New York, NY 10022; C. L. Brown, Chairman of 
the Board; T. O. Davis, Secretary. The Computing Science and Systems section and the special 
issues are included as they become available. Subscriptions: United States—1 year $35; 2 
years $63; 3 years $84; foreign—1 year $45; 2 years $73; 3 years $94. Payment for foreign 
subscriptions or single copies must be made in United States funds, or by check drawn ona 
United States bank and made payable to the Technical Journal and sent to AT&T Bell 
Laboratories, Circulation Dept., Room 1E335, 101 J. F. Kennedy Pky, Short Hills, NJ 07078. 


Single copies of material from this issue of the Journal may be reproduced for personal, 
noncommercial use. Permission to make multiple copies must be obtained from the Editor. 


Comments on the technical content of any article or brief are welcome. These and other 
editorial inquiries should be addressed to the Editor, AT&T Bell Laboratories Technical Journal, 
Room 1H321, 101 J. F. Kennedy Pky, Short Hills, NJ 07078. Comments and inquiries, whether 
or not published, shall not be regarded as confidential or otherwise restricted in use and will 
become the property of AT&T. Comments selected for publication may be edited for brevity, 
subject to author approval. 


Printed in U.S.A. Second-class postage paid at Short Hills, NJ 07078 and additional mailing 
offices. Postmaster: Send address changes to the AT&T Bell Laboratories Technical Journal, 
Room 1£335, 101 J. F. Kennedy Pky, Short Hills, NJ 07078. 


Copyright © 1984 AT&T. 


AT&T Bell Laboratories 


Technical Journal 


VOL. 63 OCTOBER 1984 NO. 8, PART 1 


Copyright © 1984 AT&T. Printed in U.S.A. 


Contrasting Performance of Faster Binary Signaling With QAM 1419 
G. J. Foschini 


Adaptive Transversal Equalization of Multipath Propagation for 1447 
16-QAM, 90-Mb/s Digital Radio 

G. L. Fenderson, J. W. Parker, P. D. Quigley, S. R. Shepard, 

and C. A. Siller, Jr. 


Enhancement of ADPCM Speech by Adaptive Postfiltering 1465 
V. Ramamoorthy and N. S. Jayant 
On Using the Itakura-Saito Measures for Speech Coder 1477 
Performance Evaluation 
B.-H. Juang 
A Packet/Circuit Switch 1499 
Z. L. Budrikis and A. N. Netravali 
An Approximate Analysis of Sojourn Times in the M/G/1 1521 
Queue With Round-Robin Service Discipline 
P. J. Fleming 
Analysis of a TDMA Network With Voice and Data Traffic 1537 
M. L. Honig 
PAPERS BY AT&T BELL LABORATORIES AUTHORS 1565 


CONTENTS, NOVEMBER ISSUE 1570 


AT&T Bell Laboratories Technical Journal 
Vol. 63, No. 8, October 1984 
Printed in U.S.A. 


Contrasting Performance of Faster Binary 
Signaling With QAM 


By G. J. FOSCHINI* 
(Manuscript received November 15, 1983) 


In this paper we determine the performance of Faster Binary Signaling 
(FBS), an alternative method to Quadrature Amplitude Modulation (QAM) 
for achieving a high bit rate over an ideal, bandlimited, noisy channel. With 
this method, signaling is faster than the Nyquist rate. Consequently, there are 
fewer points in the signal constellation, resulting in a greater separation of 
the points when the average transmitter power is the same as for QAM. Thus, 
at the expense of introducing Intersymbol Interference (ISI), there is an 
apparent improvement in noise immunity. The ISI can be mitigated with 
maximum likelihood sequence detection. We explore the advisability of trading 
freedom from ISI for added noise immunity for the extreme case where the 
system with faster signaling uses a four-point constellation. The question of 
the efficacy of FBS has been difficult to approach, but FBS has loomed as a 
possibly strong competitor among alternatives to QAM. We show here how to 
analyze FBS, and we give examples involving FBS operating at up to five 
times the QAM rate. In the examples, FBS is revealed to be, at best, of 
marginal value even if one allows for implementation capabilities far beyond 
those of forthcoming processors. 


I. INTRODUCTION 


For data communication over channels such as voiceband analog 
telephone circuits, satellite links, or terrestrial digital radio hops, there 
is a search for practical techniques that are more efficient (in bits per 
cycle) than QAM. This search is intensifying because of the growing 
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demand for data communication services over bandlimited channels 
and because of the continuing drop in the cost of high-speed processing 
required by advanced communication methods. 

The bandlimited channel with additive white Gaussian noise (power 
MN), where the average transmitter power 7(Y > _/) is con- 
strained, serves as a proving ground for theoretical explorations of the 
relative efficacy of proposed techniques. Specific methods have been 
discussed as candidates for improving efficiency. Three candidates are 
Higher-Dimensional Constellations (HDCs),’ Ungerbock’s Trellis 
Coding (UTC),? and Faster Binary Signaling (FBS).? Both HDC and 
UTC have lent themselves to analysis and their significant value over 
the QAM method has been established. Moreover, we are beginning 
to understand the relative value of UTC over HDC.* On the other 
hand, the effectiveness of FBS has hitherto remained a mystery. In 
Ref. 5* some theoretical results on FBS for some special pulses are 
presented but the relative effectiveness issue is not settled. 

To understand the FBS method, consider the elementary QAM 
method, 4-PSK (phase-shift keying), in a situation comfortably meet- 
ing a stringent probability of error (P.) constraint. If the bit rate 
requirement increases it can be met without expanding bandwidth by 
increasing the number of points in the QAM constellation. FBS is a 
natural alternative means of transmitting at higher bit rates. In FBS, 
one fixes the constellation at four points and increases the symbol 
rate as much as necessary. Maximum Likelihood Sequence Detection 
(MLSD) is employed to overcome the consequent Intersymbol Inter- 
ference (ISI) in the best way possible (see Refs. 6 through 8 for 
treatments of MLSD). The minimum separation between distinct 
points in the planar FBS constellation is greater than for the QAM 
constellation. One might say FBS trades freedom from ISI for added 
noise immunity and then MLSD is used to mitigate the ISI. 

With the efficacy of FBS unknown, it looms as a possibly competi- 
tive technique. Here we help the process of evaluating the field of 
candidate methods for moving beyond the capabilities of QAM by 
showing how to analyze FBS. We show by examples that FBS is, at 
best, of marginal value relative to QAM, even if one allows for 
implementation capabilities far beyond those of forthcoming proces- 
sors. Specifically, we allow for complexity of up to 10° states. Given 
the strides in processor technology in the last few decades and the 
imminent hardware advances, a number like 10° is chosen to avoid 
outdating of this paper for a long time. Such prudence is needed. 
Indeed, the analysis that follows leaves open the possibility that FBS 


* The terminology “faster binary signaling” is not used in [A] and [M]. In [M] the 
term “faster than Nyquist signaling” is used. 
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would offer substantial improvement over QAM if complexity were 
not a consideration. Moreover, one must also consider that fast detec- 
tors could go far beyond conventional MLSD in processing efficiency.? 


Il. SYSTEM DESCRIPTION 
2.1 Transmission medium and its use 


The data transmission medium is represented here in the simplest 
idealized form as a lossless characteristic with additive white Gaussian 
noise. The channelization is shown in Fig. 1. On each channel the 
transmitted signal is subject to an average power constraint and it is 
assumed that, for all the systems that we discuss, the average trans- 
mitted power is much greater than the noise power. [The ramification 
of this high signal-to-noise ratio (s/n) assumption is discussed in 
Section III.] In the analysis that follows, we assume that the channels 
are isolated from each other in that it is not permitted to mitigate 
Adjacent Channel Interference (ACI) through some elaborate scheme 
requiring coordination among channels. It is required that R bits per 
second be transmitted over the channel. Whatever the form of modu- 
lation used, soft maximum likelihood sequence detection is employed 
at the receiver.’° 


2.2 Comments about the benchmark QAM method 


It is because of the current prominence of QAM in applications that 
the work here is presented with QAM as the benchmark method. Since 
a flat channel transfer characteristic is assumed, it is trivial to relate 
results to the equivalent baseband channel representing a QAM rail. 
We use M = m? to denote the number of points in the QAM constel- 
lation. So, constellations with 16, 64, 256, and 1024 points correspond 
to FBS operating at 2, 3, 4, and 5 times the conventional rate. We 
elected to work with square QAM constellations even though certain 
departures from such constellations yield superior performance.’ The 
reason for our choice is that we want to analyze FBS in isolation and 
the aforementioned departures can be viewed as the first step in using 
the HDC method. Finally, we employ the harmless expediency of 
dealing with M as if it is a continuous variable in our calculations and 
in some of our graphs. When M is a positive integer that is not a 
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Fig. 1—Adjacent passband channels of bandwidth WY = 2W. 
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perfect square, one can find a two-dimensional constellation that 
realizes the situation covered by the analysis. 

Throughout we will assume that the standard QAM method is 
meeting a P, requirement (107° or less). When we compare alternatives 
to the QAM method, we will associate primed variables with parame- 
ters of the non-QAM system where needed to avoid ambiguity. 


2.3 FBS 


The generic system structure depicted in Fig. 2 is interpreted here 
for the special case of FBS. For FBS the binary data are blocked into 
successive 2-bit words. A pair of independent, synchronous, delta 
function streams are formed using the two bits to randomly sign the 
pair of delta functions that are input to the pair of baseband filters. 
The baseband filters nominally cut off at W cycles/second. On each 
rail the pulse rate is r = R/2. For convenience we use JT’ = 1/r to 
denote the time interval between impulses. The filter outputs combine 
to form the in-phase and quadrature rails of a passband signal. Thus, 
it may seem that what we have is 4 PSK; but FBS is unusual in that 
its symbol rate, 1/T’, is higher than the conventional 1/T = 2 W. 

The higher rate does not increase bandwidth but it does cause ISI, 
which will be combatted with MLSD. The ISI is assumed to involve a 
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Fig. 2—Generic system structure. For QAM the transmit and receive filters are low- 
pass filters. For the FBS case they are a matched pair with finite memory in the sampled 
data domain. The nominal cutoff frequency of the low-pass filters is W= 4/2. 
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memory of », i.e., the bandlimited baseband filters are such that the 
impulse response sampled at times {n/T’}?_-.. have the form h = 0, 
ho, hi, --- h,, 0. Thus MLSD requires 2’*! states (2” per rail).* 

We would like to, if we could, make the following idealizations 
concerning H(f), the Fourier transform of the impulse response of the 
baseband filter. 

A. Each member of a bank of FBS passband systems is spectrally 
disjoint [in baseband, H(f) vanishes outside (~W, W)]. 

B. H(f) is a trigonometric polynomial of degree vy on -—W’ <f < 

W’ [W’ = 1/(2T’) > W). 
The first of the above is needed for spectral efficiency. Statement B 
is needed to be consistent with the assumption that MLSD involves a 
memory of vy. Mathematically it would seem impossible to meet A and 
B. After all, if H(f) is a trigonometric polynomial vanishing on 
[W, W’], it vanishes everywhere. There is no real difficulty here. We 
will adhere to B with v the degree of H(f). While A will not strictly 
hold, one can get as close to ideal as desired as long as v is large enough 
to meet out-of-band energy constraints. In MLSD a matched filter is 
used to initiate the detection process. The matched filter receiver 
serves to select the desired band [-W, W]. We have a dual view of 
what the frequency band is. From the point of view of where the signal 
power is concentrated, [—W, W] is the band. From the point of view 
of MLSD, we are dealing with a sampled temporal response, which 
can only correspond to a transform that is a polynomial on [—W’, 
W’]. So long as the degree of the polynomial is large enough the two 
views of bandwidth can be reconciled. 

There are several questions before us. Can we make the energy 
outside the [-W, W] band so small that the interference between 
neighboring systems is negligible, yet the number of states involved in 
MLSD is reasonable? If we can accomplish this, does the FBS system 
perform better than the comparable QAM system? How much better 
does it perform and at what complexity? 

We investigate these questions in the context of three kinds of 
discrete impulse responses. The first of these is the Nyquist responses 
(“brickwall spectra” on [—W, W]) truncated to memory »v. For these 
we shall see that the interference from neighboring systems is prohib- 
itive for reasonable v. The second set of impulse responses are the 
discrete prolate spheroidal wave functions, which, for fixed total energy 
and fixed v, have the least interference from neighboring systems. We 
demonstrate that their performance is not good. Finally we explore 
optimally designed responses and find that for reasonable v, even for 
the most favorable cases, the advantage over QAM is very modest. 


*Notice that if the constellation were not the product of two one-dimensional 
constellations, as we have assumed, the complexity would grow as 4’ instead of 2’”*?. 
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HI. PERFORMANCE CRITERION 
3.1 Definition of gain 


The probability of bit error, P., is an important performance mea- 
sure for data communication systems. For QAM as well as the systems 
employing HDC, UTC, or FBS, if MLSD is used the probability of bit 
error decays exponentially as the noise spectral density is decreased 
(except for an algebraic multiplier). That is, an exponentially tight 
bound on P, has the form 

t 
P. <xoe %” (o — 0), 


where x and & are independent of o”, the noise power spectral density 
on a single dimension. For FBS viewed in the MLSD context, the 
exponentially tight bound has the form 


—d? 
P. < xoe2” (o > 0). 


The minimum distance is defined by 


Gain A min [| heel |?, (1) 


where * denotes convolution and the minimum is over all doubly 
infinite sequences e of the form 


Olen 2+ ex0, 
where e, belongs to {0, 1, -1} and K can be any nonnegative integer. 
Clearly, din < |] h||*. In cases where d2in = || h||? it is common 


to say that the matched filter bound is attained. What is meant is that 
the exponent of P, is the same as if there were only a single data pulse 
to be detected (no ISI). The terminology stems from the fact that, 
when there is no ISI, MLSD employs simply a matched filter (along 
with a threshold comparator).’” 

We take the quantity & as a convenient indicator of performance in 
the high s/n realm. (We stress that, for models of specific systems, 
more refined computations estimating the actual error probability are 
often needed.) The “gain” of one system over another is expressed as 


, 


G = 10 logio = 


We shall be concerned in this paper with estimating the gain that FBS 
exhibits over QAM. Both UTC and HDC exhibit substantial gain over 
QAM. For UTC, gains in the range of 3 to 6 dB have been reported 
and, for a 3-dB gain, the required complexity is extremely reasonable.” 

For the conventional QAM system, ||h||*? denotes the energy per 
pulse prior to multiplication by a; belonging to [+1, +3, --- + 
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(L — 1)], so the modulated pulse has average energy (L? — 1)/3 || h||?. 
If there is one symbol every 7’ seconds, the average signal power is 
[(L? — 1)/3] [|| h||7/T]. Since the information rate per rail is 
(log.L.)/T b/s, FBS must operate at a rate of (logeL)/T pulses/s. Let 
h’ be the impulse response for FBS. For the two systems to have 
identical signal power we must have 


I]h’ |? = (L? — DI AI 17/8 logeL). 


For FBS, accounting for the wider bandwidth, the noise variance per 
sample is o” (log.L.)/T. Not necessarily all of || h’ ||? is realized in the 
error exponent. 

The gain over the corresponding QAM system is sepuiseed as 


dain L?-1 =10) lo din 4°—] 
[hI]? 3 logeL BO Tal? 3p” 














G = 10 logy (2) 
where p = log,.L = W’/W = T/T’. One could interpret 10 logio[(4’ — 
1)/3p] as a noise immunity gain and 10 logjo(d2in/||h||7) as the 
penalty for ISI. 

Shortly it will prove useful to allow for replacing the noise power 
spectral density on the FBS system by a level greater than that on the 
slower system, say (1 + 8)o? with 6 > 0 in place of o”. This will enable 
us to compensate for interference from adjacent channels. When, and 
if, the matched filter bound is attained the gain is expressed by 10 
logio(4° — 1)/[3o(1 + 8)]. Figure 3 depicts this function with 8 as a 
parameter. It is evident from the 6 = 0 curve that, depending on p, if 
the matched filter bound is attained, the gain can be considerable. 

We consider now the interference in the band (—W’, W’) that stems 
from those channels (other than the primary channel centered at zero) 
whose power spectral density is nonzero in (-W’, W’). The determi- 
nation of the additional power due to these interfering channels is 
straightforward. When measured for a single rail, at the output of the 
matched filter, H*(w), the power is the same as if o” were replaced by 


. EF DHF alg 
o + ‘e A? 
{. | H’(g) Pdg 


The sum is over all neighboring systems overlapping the (-W’, W’) 
band. Because of its genesis, the term that adds to o” in (3) is called 
the Adjacent-Channel Interference term or ACI. For H’(f) with nearly 
all the energy in the (—W, W) band, | H’(f) |?/fw | H’(g) |?dg has a 
mean value of approximately T on (—W, W). Since each channel is 
symmetrically disposed relative to its neighbors, the integral in the 


df. (3) 
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ADJUSTED GAIN = 
4P-4 
3p(1+8) 
[o’2 = (1+8)02] 


10 logig 


ADJUSTED GAIN IN DECIBELS 





1 2 3 4 
SPEED FACTOR, p 
Fig. 3—Adjusted gain versus speed factor when matched filter bound is attained. 
ee 


Adj in = 10 logig ———~.. 
djusted gain O£10 3a(1 + B) 


numerator of (3) can be replaced by f W,. It follows that the strength 
of the ACI term in (3) is roughly indicated by that energy in a pulse 
with transform H’(f) that is out of the band (—W, W). (We use OBE 
to denote out-of-band energy.) This approximation becomes more 
precise if H’(f) is approximately flat in (-—W, W) (as is the case in 
Section IV). 

If ACI is not negligible, it is reasonable to modify the gain by 
subtracting 10 logio(1 + OBE/o”) or, the more precise but more 
complex, 10 log,o(1 + ACI/c”). No matter which gain expression is 
most appropriate, if we insist on some degree of spectral isolation it is 
unclear how much gain can be attained at specific levels of complexity 
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(2’*! with v < 26). In the sequel we will find that the answer is not 
much gain. For the analysis in Sections IV and V we consider OBE in 
the range [0, o”]. As we note in Section VI there is no point in 
considering OBE outside this range. 


3.2 Error events 


We can see from the formula for minimum distance (1) that, if we 
slide a window of size v along an error sequence e, a repeated state is 
forced to occur by 3” shifts. It follows that, to attain the minimum, 
one need not search over more than 3° events. Since {3° |» = 
1, 2, 3, 4, ---} = {27, 19683, 7.63 x 101”, 4.4 x 10°8, ..-}. We see that, 
even for v = 3, a brute force search is extremely ambitious and for py = 
4 it is completely out of the question. (Reference 13 discusses three 
other state symmetries as well as a repeat.) 

For future reference, we borrow from Ref. 13 and list four useful 
representations and notations for error events in Appendix A. 


3.3 Searching the tree of error events for d2in 


For each », it is useful to view a set of error events, one of which is 
guaranteed to achieve d2,in as a tree. Construct a tree of sequences 
with three branches emanating out of each node and with the labeling 
illustrated in Fig. 4. The labels along each upward path represent the 
beginning string for the nonzero portion of an error event. Once a 
string of v consecutive zeros is encountered, the growth out of such a 
node is pruned from the tree, since continuing the event with nonzero 
elements will correspond to creating labelings for beginnings of events 
with a greater || h+*e||? than the all-zero continuation. 

To envision a computer search for d2i, for a specific h, one can 
think of climbing up the tree and to the left and at each node 
computing the accumulated 


> (h®, At)? (notation in Appendix A) 


on the upward path to the node. There is one summand for each node 
in the upward path. Climb higher if the record low for a completed 


oe ee & & © © & 8 © © © © © © © © 8 8 8 8 CFC Oe we we we © © 
se e e © & © © © # #+  j (es @ ® @»@ © © © © © + |e © @ @ ®@ eg wee 
oe ee © © © © © © #+ (je @ © #® © © e@ e@ e& = =  e we © gg @ © e o @ 
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Fig. 4—Error event tree. 
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error event (a number < ||/]||”) has not been exceeded. Otherwise, 
climb down to the first node that offers an unclimbed branch and then 
climb that branch. Whenever a node with y consecutive zeros is 
reached, and the old record has not been reached, record the new 
candidate event for achieving d2,;, and the new record before climbing 
down. 

Some additional special search tools prove useful. Specifically, one 
can terminate an upward climb whenever any of the four symmetries 
A; = +A;(i <j) or A; = +A?(i < j) is detected (see Ref. 3). Two other 
search tools are very powerful, one for symmetrical h (see Appendix 
B) and one for nonsymmetrical h, which is discussed in Section 6.3. 

Searching for d2,;, in the manner described will enable us to inves- 
tigate the efficacy of FBS. In Sections IV, V, and VI, we consider three 
classes of FBS pulses. 


IV. NYQUIST PULSES 
4.1 Performance 


One of the most elementary results in data communication theory 
is Nyquist’s result that, for WT = 1/2, signals of the form 
ye. Anh(t — nT) with h(t) = sin t/t are ISI-free. In FBS, we replace 
T by T/p with p > 1 and, as we have already mentioned, the signal 
bandwidth is invariant to p but ISI arises. A hypothetical FBS system 
based on such a pulse incorporates a level of idealization beyond the 
standard one associated with the abrupt cutoff. Namely, the system 
represents the limits of infinite decoding complexity as well as zero 
energy in the bands, W < |/| < W’. 

For each system, assuming the pulse energy is normalized to 1 and 
W’ is normalized to 1/2, we have 
2 


dw. (4) 


/p K 
1-¥ «ge 


1 


din = inf 
Qr ~t1/p 








The infimum is over all error events with «, belonging to {0, 1, —1} 
and K ranging over the positive integers. Expression (4) is considered 
in Ref. 5 where it is demonstrated that d2;, > 0 for all p => 1. The 
intriguing question of whether, for such systems, a positive “gain” is 
available remains open. 
Let 

lol <* 

p 


HY*(w)A 


0 otherwise 


denote the FBS pulse transform normalized to unit energy. While 
expression (4) allows for infinite complexity, for any implementation, 
approximations of H’” must be considered. The optimum least-mean- 
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square approximations, the Fourier series {HN¥(w)A 6 qne’””}, are 
natural responses to use to inquire whether, as v increases, FBS 
performs better than QAM. Of course, if » becomes too large the 
required detector becomes forbiddingly complex. 

The tree search discussed in Section 3.3 was used to determine the 
“gain” for the least-mean-square approximations. The symmetry of 
the impulse responses allowed the addition of the test of Appendix B 
to significantly reduce the running time of the algorithm. 

The results are shown in Fig. 5. The “apparent gains” are only 
meaningful if the Out-of-Band Energies (OBEs) are sufficiently small. 
Indeed, they are not sufficiently small as we now discuss. Figure 6 
shows the out-of-band energy for a unit energy response for v = 26. 
Also shown are the noise levels for the benchmark QAM system 
providing the same information rate at a P, of 10-* and 10°°. Of the 
four points p = 2, 3, 4, and 5, only p = 2 shows the out-of-band energy 
below the noise level. The margin for P, = 107° is slight (=4 dB) but, 
from Fig. 5, we see that for y = 26 the “gain” is negligible (~0.1 dB). 
For p = 2, if we reduce v to increase the “gain”, the attempt is 
undermined by the increase in out-of-band energy. The out-of-band 
energies for vy = 20 and py = 14 are also shown in Fig. 6, for p = 2. 

We conclude that, for the least-mean-square approximation of a 
Nyquist pulse, FBS signaling under the mild requirement P, = 107° 
does not offer any significant gain over QAM. In making the compar- 
ison, we have allowed FBS the extraordinary complexity of 27° = 6 x 
10’ states per rail (>10® states total). If P. were decreased, FBS would 
fare even worse. 

Figure 7 illustrates approximate Nyquist spectra for v = 26 and 
minimizing error events. We note that for »y = 26 the number of 
candidate error events exceed 10%” < 3°, 


4.2 Infinite complexity, complete spectral confinement asymptote 


Allowing for complexity not exceeding 10° states, FBS is not attrac- 
tive relative to QAM for the examples considered thus far. As we 
mentioned in the last section, for the asymptote of infinite complexity 
(vy — o) and stringent out-of-band energy constraint, the limiting 
squared distance is expressed in (4). Although we cannot compute the 
gain G,, we can find an upper bound using candidate error events. We 
used events revealed to be useful in the tree search for v < 26. The list 
of trigonometric polynomials below {E,(w)}°-2 was used to bound d3in: 


E.(w) = (1 — e@” + eF* — eV + efie — ei) 
E3(w) = (1 — e’* + e¥* — e%*) 
E,(w) = (1 — e@@ — ee” + e¥* + etl? — eF*)(1 + ell) 


E(w) = (1 — ef? — ee 4 ete 4 ete — ele). 
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Fig. 5—Apparent gain of FBS over QAM (gains unachievable because of interference from adjacent bands). 
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Fig. 6—Comparison of interference levels. 


They yield Ge < 0.107 dB, G3 < 1.4 dB, G4 < 0.477 dB, and Gs <1.1 
dB. This shows that, even allowing for an arbitrarily large number of 
states, in the limit of stringent out-of-band energy requirements, the 
gains available using a Nyquist pulse are at best very modest. The 
E,(w) characteristics are illustrated in Fig. 8. 


V. DISCRETE PROLATE SPHEROIDAL WAVE FUNCTIONS 


In Section IV we investigated whether the approximations HY” have 
good distance properties for FBS with p = 2, 3, 4, and 5. We found 
that, for reasonable complexity and out-of-band energy constraints, 
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Fig. 8—Extremal error event for (a) p = 2, (b) p = 3, (c) p = 4, and (d) p= 5. 


they do not. The H," minimize J, | Hi’ — ¥% qne’”* |?w(w)dw in 
the special case when the weight function w(w) = 1. In light of the 
results of Section IV, we can reformulate the least-mean-square ap- 
proximation using a w(w) that is 0 on (—z/p, z/p) and 1 otherwise. 
The weighting reflects the fact that it is essential to keep the out-of- 
band energy small but, having seen that the flat transform has no 
special distance properties, we have no motivation for keeping the 
transfer characteristic flat within (—7/p, z/p). 

The extremal responses so obtained are called the Discrete Prolate 
Spheroidal Wave Functions (DPSWF).”* Their theory has been devel- 
oped by Slepian.’° Wyner has suggested their consideration for use in 
data communication systems for reasons other than those we are 
considering here.’ Let H?% denote the transform of the discrete 
spheroidal wave function of memory v corresponding to an FBS system 
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with parameter p. Since minimizing the out-of-band energy corre- 
sponds to maximizing the in-band energy, the coefficients of H?* are 
then components (q,, qi, --: g,) of the eigenvector of the matrix of the 
symmetric quadratic form 

«/p 


— ldo + quel’ +--+ g,e?”” [dw 
Qa -x/p 


corresponding to the largest eigenvalue, i,,,. Since we normalize by 
constraining HfS to have unit energy, the quantity 1 — ),,, is the out- 
of-band energy. 

In Fig. 6, 10 logio(1 — X,,,) is plotted against p for various vy (see 
dashed curves). Unlike HNY we see that a significant portion of the 
loci for H®> are disposed well above curves for the noise levels for 
P, = 107° and P, = 10~*. Thus, there are spaces of systems of moderate 
complexity with small out-of-band energy, whose distance properties 
are of interest. What are the distance properties of H?’S in the range 
p = 2, 3, 4, 5? They are not good. Use of the search algorithm of 
Section 3.2 demonstrated no gain for any HS whose (», p) coordinate 
corresponded to an out-of-band energy below the level of the Gaussian 
noise for P, of 10~*. For example, for p = 2 at v = 4, the out-of-band 
energy is —26.3 dB, which is below the level of the additive noise. 
However, the minimum distance of H¥3 is poor, specifically G = —1.33 
dB. For larger v, the out-of-band energy drops precipitously but 
distance decreases as well. As p increases in the range 3, 4, and 5, the 
situation worsens: G values significantly below 0 dB occur with out- 
of-band energy prohibitively above the noise level. As y is increased, 
the distance drops markedly. 

At this point we have an interesting situation. The least-mean- 
square approximations to the Nyquist pulse are shown to have attrac- 
tive “apparent gains” relative to QAM but the gains cannot be realized 
because the signal spectrum is not adequately confined. On the other 
hand, the results for DPSWF’s show that great spectral confinement 
is possible, but these pulse shapes do not exhibit any gain over QAM. 
The question remains as to how much gain we can achieve under a 
spectral confinement constraint for a specific complexity. This is 
addressed in Section VI. 


VI. OPTIMUM PULSE DESIGN 


In this section we investigate the performance of optimally designed 
FBS responses of prescribed complexity (i.e., prescribed memory »). 
By optimum we mean that the minimum distance is maximized. The 
transmitted power, which is proportional to 1/(27) f*, | H(w) |’dw, 
and the out-of-band energy, which is proportional to 1/7 
f%,-. | H(w) |?dw are both constrained. Once the optimum h = hsh’ 
(equivalently H = | H(w) |) is found, we factor H to determine an 
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optimal h. (Since H can have 2p zeros disposed in inverse conjugate 
pairs there can be as many as 2’ possible factors of H that have real 
coefficients.) The problem of finding optimal h is essentially a linear 
programming (LP) problem. The suggestion of viewing optimum 
MLSE system design as an LP problem appears in my paper with R. 
R. Anderson.?? It turns out that, in most cases of interest, the number 
of constraints corresponding to the various error events is too large 
for the LP to be useful by itself. The LP is combined with the tree 
search algorithm that serves to eliminate most error events from 
consideration. The LP-tree search algorithm solves the design prob- 
lem. We proceed now to describe the LP and show how it is integrated 
with a tree search algorithm. Then we present the performance results 
for optimally designed pulses. 


6.1 Linear program 


Recall that an LP problem is one of the following type: Given a 
vector ¢, find a vector y that maximizes (y, c) subject to a set of linear 
constraints of the form (y, a;) = b;, i belonging to Z a finite index set. 
The a; and b; are given vectors and scalars, respectively. It is very 
useful that < constraints can be converted into = constraints by 
changing sign and so equality constraints can be represented by a pair 
of = constraints. 

In our application, 7 is infinite. Since we shall see that the feasible 
y exists in a bounded set we can, in principle, obtain a solution as 
close to optimum as desired by solving an LP with sufficiently many 
constraints. 


6.2 Embedding systems in a 2v + 2 space 


Now h=0h.,,--- ,h,, ---, h, 0. We will represent h in a 2v + 2 
dimensional space where the first 2v + 1 coordinates are (h_,, --- , 
h,). An additional coordinate augments the projection of h so that we 
have (h_,, --- , h,, s). The augmented vector of 2v + 2 components is 
denoted y. The additional coordinate, s, is a mathematical convenience 
that will facilitate maximization of the minimum squared distance, as 
we shall see. 

To describe the linear constraints defining the set of admissible h, 
we need to employ 1; to denote a vector that has all-zero coordinates 
except the kth coordinate, which is a one. 

In 2v + 2 space, we describe linear constraints defining the set of 
admissible h. 

1. As a convenient normalization, we assume that the energy in h 
cannot exceed 1, so h, < 1; therefore, 


(y; po 1,41) =-—l. 
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2. heh’ =h, so fori S v, h_; = hj therefore, 
(y, — 1; + Lae) = 0. 


3. H(w) is nonnegative. H(w) is the function that has h for Fourier 
coefficients and the operation of Fourier series is a linear one. So the 
constraints H(w) = 0, 0 < w < a, can be put in the form (y, a.) = 0 
by defining a,, appropriately. There is one constraint for each w on 
0 < w < x. In our application we can use a discrete set of the form 
{wn = (nr)/N}%-, with N sufficiently large to give adequate accuracy. 

4. The out-of-band energy cannot exceed a prescribed amount 6, so 
1/m f%/o H(w) dw < 6. Let O(w) be defined to be the function that 
vanishes for | w| < z/p and is 1 otherwise. Therefore, 1/(27) {"_ H(w) 
O(w) dw < 6. By the Parseval theorem, we can express this constraint 
as (y, a) <0, where ais a 2y + 2 vector with a zero in the last position 
and the first 2v + 1 coordinates are Fourier coefficients of O(w) with — 
index of absolute value < »p. 

5. The 2v + 2 component, seemingly extraneous so far, now comes 
into play. Let {E;(w)};.- be the error polynomials. Project them into a 
2v + 2 dimensional space using the successive Fourier coefficients with 
index of absolute value < pr to get the first 2v + 1 components and use 
—1 for the last component. Call the resulting vectors {e;};._¢. It will 
not bother us if some E;(w) have nonzero Fourier coefficients with 
index exceeding v. Taken together, the constraints {(y, e;) = 0}j-¢ 
amount to a statement that, for each admissible h, the squared 
minimum distance is never larger than a candidate distance. 

The optimal h is the one maximizing the minimum distance. So the 
constrained y attaining max (y, 1,42) has the optimum design for its 
first 2v + 1 coordinates and the optimal exponent for the last coordi- 
nate. 

In 2v + 2 space, the set of all y meeting constraints is denoted Y. 
Y is not empty. For example, it contains 61,41, where 6 is a number 
small enough that energy constraints are met. The optimization will 
not degenerate as Y is closed and bounded. Y is closed since it is 
expressible as the intersection of closed half-spaces. Y is bounded 
since each component of y is bounded by the pulse energy constraint. 
To see why, note that e = 1,4; shows y2,42 < 1. For the remaining 
bounds on the components of y we note H(w) = ¥”, xm4i1+,e7" = 0 
and so factorization is possible, H(w) = H(w) H*(w). Fourier coeffi- 
cients (x1, x2, --- , Xey+1) are sums of products and so, by the Schwarz 
inequality, y,+1 = x,4; bounds all the components of each y vector. 


6.3 The optimization algorithm 


The linear constraints include the error events and, in most exam- 
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ples of interest, there are too many error events. For example, for v as 
small as 4, the estimate in Section 3.1 indicated that there are over 
10*° error events about which we should be concerned. The difficulty 
of too many constraints may sometimes be handled by solving a 
problem with a manageable number of the constraints. If it can be 
verified that the optimum meets all constraints (not just the manage- 
able ones), then the solution to the simplified problem is the same as 
the solution to the difficult problem. We design, via an LP, a response 
maximizing the minimum distance over some error events and then 
seek to verify, using a tree search, that the minimum distance is not 
reduced if one minimizes over all error events. 

If the above procedure is unsuccessful, one can repeat it, enlarging 
~ to include the minimizing error event revealed by the tree search. 
Eventually, the iteration process will converge. Prior to convergence, 
LP gives an upper bound while the tree search gives a lower bound to 
the d2.in achievable by the optimum design. 

The LP provides an h, while the tree search requires an h. The 
minimum-phase deconvolution, fh, is suggested, since, among all h 
satisfying h’«h = h, h has the greatest Y., h? for each k.!7"8 This 
maximal frontal energy concentration expedites the tree search, which 
operates first on the leading coordinates of h. (Orders of magnitude 
of difference in running time have been observed between minimum- 
and maximum-phase deconvolutions in the tree search algorithm.) 


6.4 Performance 


In estimating the performance of the optimum system employing 
an h of memory », power levels were set as follows: For the FBS 
system, as we mentioned in Section 6.2, the pulse energy was bounded 
above by one. The noise level was set to meet the required P, in the 
benchmark system operating at maximum power. Finally, the out-of- 
band energy constraint was set to a fraction of the noise power. 

The resulting gain versus v curves are shown in Fig. 9 with v as a 
parameter. The OBE is constrained to 07/10 so a penalty of 0.414 = 
10 log 11/10 is included in the gain calculation. It is apparent from 
the curves that, even at extraordinary complexity (exceeding 10° 
states) and a P, requirement of 107°, the resulting gains are very 
modest. This conclusion is not sensitive to the exact premises under- 
lying the computation. Calculation shows that, for v = 26, if we allow 
6 to be larger than 1/10, the gains generally decrease because of the 
OBE penalty. There is little to achieve by making OBE smaller than 
o”/10 since the design is merely more constrained, and omitting the 
OBE penalty cannot add more to the gain than 0.414 dB. When the 
gain is positive, the ACI levels are generally within 0.25 dB of the 
OBE level so the gains based on ACI are not different in any important 
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Fig. 9—Gain limit versus memory under spectral confinement constraint for P. = 


way from those based on OBE. There is no point in showing curves 
for P. < 107° in the benchmark system, as the gains can only decrease 
if the design is further constrained. 

Figure 9 has enabled us to determine the relative merit of FBS for 
reasonable complexity. The possibility remains that, for extremely 
large v, FBS could exhibit substantial gains and that these asymptotic 
gains could improve as p increases. 

Figure 9 was derived using a list of 50 error events obtained by 
running the LP-tree search iteration for successive v values. To con- 
clude that FBS offers at best a very modest improvement over QAM, 
it is only necessary to present upper bounds in Fig. 9 rather than exact 
maximum gains. However, in preparing Fig. 9, we established that it 
is reasonable to exercise the LP-tree search algorithm to guarantee 
the precise optimum gain that can be attained for up to 10° states. It 
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Fig. 10a—Example of an optimally designed transmitter characteristic and an error 
event characteristic. Vertical dotted line delineates the band edge. 


is interesting to note that optimum system design can be accomplished 
for systems with such an enormous number of states. 

Figure 10a illustrates an optimally designed spectrum and a corre- 
sponding minimizing error event. Figure 10b illustrates an interesting 
constrast between a pulse spectrum and an extremal error event. 

For p = 5/4, the gain is only about 0.5 dB but a very interesting 
behavior is observed. Namely, with little complexity, the maximum 
distance possible is attained in the sense that the matched filter bound 
is obtained. From Section III, eq. (2), we can write the following 
expression for the gain (neglecting the OBE penalty): 


4° — 1 
Ga S10 lees a Olona) 
p 


where the function c(p, v) gives the fraction of the matched filter 
energy attained. For fixed p, c(p, v) is a nondecreasing function of vp. 


With p fixed, is lim,.. c(p, v) = 1? In the case p = 5/4, we have seen 
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Fig. 10b—The in-band transmitter spectrum is optimized for a limited set of error 
events, one of which is shown. The extraordinary flexibility afforded by over one million 
states allows the optimum spectrum to have some peaks and valleys in opposition to 
those of the minimizing error event. 


that the answer is yes. For p > 3 consideration of the error transform 
|1 — e|? shows that the answer is no, as 


lim c(p, v) < 4 sin? (=) a4 


in the limit of stringent out-of-band energy constraints. In light of the 
limited gains available with optimal FBS, it would be of only academic 
interest to pinpoint the largest » value for which the matched filter 
bound is attained as complexity is increased. Consequently, we shall 
not pursue this question further. 


VII. DISCUSSION 


At this point it is natural to question whether it is worthwhile to 
generalize and consider Faster Multilevel Signaling (FMS). Motiva- 
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tion for considering FBS comes from Ref. 5, where the first theoretical 
results on FBS were reported; from Ref. 3, where highly significant 
benefits of FBS were suggested but not established; and from discus- 
sions with J. Salz, who related that the idea of FBS has been around 
for many years and that it is important to settle the question of its 
‘merit. Since FBS proves to be unattractive, why should one consider 
FMS, especially when we know that increasing the number of levels 
toward that of the competing QAM system would seem to blur the 
distinction? Can one generally discount the competitiveness of FMS? 
We discuss why we cannot dismiss FMS and why, despite the findings 
on FBS, FMS systems may have some value. 


7.1 Relieving the OBE constraint 


FBS fared poorly. If we look back on our analysis of FBS it is 
obvious that it was the OBE constraint that drove the performance 
level of FBS. We noticed in Section IV suboptimal pulses exhibiting 
substantial gains that could not be realized because of prohibitive 
OBE. The stringency of the OBE constraint was necessitated by the 
substantial overlap of spectra between neighboring systems. As we 
move away from binary toward more levels, in the class of FMS 
systems, to compete with a fixed QAM system, the ratio p = W’/W> 
1 decreases. The OBE constraint we need to impose is seen to be more 
relaxed. 

Moreover, as we decrease p, systems are represented for which the 
ACI constraint is not of any direct importance. There is the interesting 
class of questions pertaining to transmitter filter smoothness consid- 
erations. For example, which performs better—a QAM system em- 
ploying a square root raised cosine pulse with roll-off a = p — 1, or an 
optimized FMS system with band-edge nulls of specified order and 
with system memory v? The two systems are required to have the 
same power and information rate. The answer, of course, depends on 
M, a, v, and the degree of the band-edge null. The band-edge null is 
useful for spectral confinement as well as for easing synthesizability. 
The imposition of nulls of specific order at specific frequencies lead to 
additional linear constraints and is easily handled by the LP-tree 
search program. 

The simple partial response 0, 1, 1, 0 can be used to illustrate that 
there are situations where FMS can be very beneficial relative to 
QAM. Among all systems required to have a band-edge null, the system 
0, 1, 1, 0 requires the least number of states, m per rail. It is easily 
shown that the response attains the matched filter bound independent 
of m. Figure 11 shows a gain versus roll-off plot, which speaks for 
itself concerning the substantial gains that are available in certain 
cases. 
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Fig. 11—Gain versus rolloff characteristics for partial response 2~”? (0, 1, 1, 0), 
where the approximate number of constellation ee are shown for competing QAM 
system, and the number of levels equals the number of states per rail. 


A class of examples where ACI is not of direct concern occurs with 
the voiceband channel, which has severe band-edge attenuation. The 
channel shapes are irregular mounds and there is no obvious spectral 
support to assume. For a specific information rate, what is the best 
baud to use if the transmitter spectrum has a null of given order and 
zero rolloff? This is paraphrasal of the FMS issue coupled with a 
simpler question of where to center the signal spectrum. (The trans- 
mitter design must also account for the effect of nonlinearities and 
the fact that the exact modulus of the channel transfer characteristic 
is not known at the transmitter.) 


7.2 The LP-tree search algorithm 


Aside from the new information on FBS, a major finding of this 
report is that, for the class of MLSD systems considered, optimum 
designs can be accomplished involving numbers of states correspond- 
ing to the capabilities of forthcoming MLSD implementation technol- 
ogy (and far beyond). We have concentrated here on binary systems 
and a very special channel. However, the algorithm extends to apply 
to designing optimum m-ary systems of prescribed complexity oper- 
ating over arbitrary linear dispersive channels. The astronomical 
number of error events is not an obstacle. 
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The extended algorithm, now being programmed in the course of 
joint work with G. Vannucci, will provide a basic tool for probing the 
fundamental relationship between attainable rates and system com- 
plexity for very general systems. Suppose one wants to achieve a 
certain information rate, under spectral confinement requirements 
and with a specific level of complexity. By exercising the LP-tree 
search algorithm for a sufficient number of p values, one can locate 
Popt, the optimum p 2 1, and the associated optimum gain over a 
corresponding QAM system with cosine rolloff spectral shaping. 
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APPENDIX A 
Error Event Representations 
A.1 Sequence representation 


S200 OOO OD Bem OeeseO: 


A.2 State representation 

Ai, As, As, --+, Ax++1, Where the states A; are defined as the 
successive v-tuples of the sequence representation, where the all-zero 
y-tuples are omitted, except for the v-tuple abutting ex. 
(0, 0, ee 0, €0), (0, 0, sry €0, é1); 


-++ , (ex, 0, 0, --- 0), (0, 0, 0, -+- 0). 


A.3 Augmented state representation 
Aj, Ag, Ag, --- Ak+41, where the augmented states Aj are defined 
as the successive (v + 1)-tuples of the sequence representation 


(00 --- Oeo), (00 --- ee) «++ (€x00 --- 0). 
v+til yv+ti1 


This representation derives its usefulness from the equality 


K+v+1 


x (h’, Av)? = || heel’, 


where h® 4 (h,, h,-1, --- , ho) and the inner product is defined in the 
usual way. The b superscript is read “backward” and the b operation 
is also applied to error events in the memorandum. 


A.4 Functional representation 


The error sequence maps to the nonnegative cosine polynomial 
E(w) = |e + ae + --- + exeR |? 


on the interval —7 < w <7. 

We shall refer to each e as an error sequence, error event, or error 
pattern. Let H(w) 4 | ho + hie + --- + h,e’””’|? and define (E, H) 
1/(27) f2, E(w)H(w) dw. Then by Parseval’s theorem, (E, H) 
||e+h||?, so we have 


I ie 


d2i.(h) = min (E, H) = min a { E(w)H(w)dw. 
E E 27 J-x 
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APPENDIX B 
Expediting Tree Search When Response Is Symmetrical 


We present some observations other than the four symmetry con- 
ditions that are useful for efficient calculation of d?,;, and a minimizing 
error event e for a symmetric transmitter impulse response. 

Let {S;}2. denote the resulting sequence of scalar products in hee. 
Let K be the smallest integer satisfying sjsx ~ 0 with s, = 0 fork ¢ 
{1, 2, --- K}. If, in the course of searching the tree, an error event 
breaking the current record, A, is encountered, then, for it to be a 
minimizing event, we must have 


si tsz+.--- sk <A. 


Let [K/2] denote the largest integer less than or equal to K/2. For a 
record breaking error event or that error event in reverse (or both) we 
must have 


Si + 83 + +++ Sikjo) < A/2. (5) 


Since || h*e||? = || h*(+e°) ||? error events for which it is established 
that the inequality (5) is reversed need not be explored further in the 
tree search for d2,;,. 

To expedite the process of seeking d?,;, among error sequences for 
which (5) holds, we discuss the calculation of the height at which the 
exploration of the growth of nodes terminates. Let L be the first 
integer for which the accumulated sum of s? + s§ + +--+ Si4,; > A/2 
so that s? + so + --- si < A/2. Clearly, L = [K/2] so2L+1>K. 
Once L is detected there is no need to explore any events involving 
2L + 2 scalar products. To put it another way, if L’ = L + 1 is the 
first index for which A/2 is exceeded, then s2z, = 0 and 2L’ > K. 

It is not always necessary to search to height 2L’ to terminate 
growth exploration. To see this note that from the height, 7’, of 
occurrence of the last nonzero element in the event under exploration 
we have that K > ” + v. So once L’ is determined exploration of 
growth is terminated if 7 + vy > 2L’. 
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Adaptive transversal equalization is an effective and relatively new coun- 
termeasure for dispersive multipath propagation in terrestrial digital radio 
networks. In this paper we describe the design and performance of a five-tap 
baseband analog equalizer developed for a family of 16-QAM, 90-Mb/s radio 
systems. Laboratory measurements and field evaluation during a five-month 
fading season in Palmetto, Georgia, indicate that the use of this adaptive 
transversal equalizer can significantly reduce the need for costly space-diver- 
sity equipment. 


I. INTRODUCTION 


The impairment of terrestrial digital microwave reliability due to 
multipath propagation is widely recognized.' Unlike FM radio systems, 
where system outage is predominantly determined by the thermal 
noise aspect of fading, digital radio is also affected by the dispersive 
character of multipath fading. This dispersion, engendered by signifi- 
cant amplitude and delay distortion across the channel bandwidth, 
causes considerable Intersymbol Interference (ISI) that degrades dig- 
ital radio reliability well beyond that expected from the flat fade 
margin alone.” Multipath-induced distortion thus is considered the 
predominant cause of digital radio outage for frequencies under 12 
GHz. 

Presently, several methods are used to counter the impact of mul- 
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tipath fading. These include frequency diversity,® space diversity,* and 
adaptive Intermediate Frequency (IF) equalization. Examples of the 
latter include slope equalizers® and notch or resonance equalizers.® 
However, frequency-selective fading corrupts both the amplitude and 
phase of a transmitted signal. While IF equalizers can be designed to 
condition a channel properly for minimum phase fading, they double 
the delay distortion during periods of nonminimum phase fading. 
(Minimum phase and nonminimum phase fading is clearly defined by 
Giger and Barnett’ for a two-path statistical model of multipath 
propagation.) This effect naturally impacts the outage of those digital 
radio systems that rely solely on amplitude correction.’ 

Although adaptive transversal equalizers are a relatively new coun- 
termeasure to multipath fading in digital radio systems,°® their prior 
application in mitigating the effects of linear distortion in other, lower- 
speed, digital communication networks is firmly established. Current 
practice is to use transversal equalizers in conjunction with IF equal- 
izers. In a recent study, Foschini and Salz® considered the application 
of equalization techniques to digital data transmission over radio 
channels subject to frequency-selective fading. Their theoretical study 
of ideal transversal equalizers with an infinite number of taps clearly 
established the utility of linear equalization during multipath propa- 
gation. These equalizers are especially noteworthy in that they are 
capable of providing amplitude and delay equalization for minimum 
and nonminimum phase fades. 

The baseband synchronous transversal equalizer briefly described 
here was designed for a family of 16-QAM (Quadrature Amplitude 
Modulation), 90-Mb/s radio systems. Designated DR 6-30 and DR 11- 
40, these digital systems provide 3-bit/Hz operation in the 6- and 11- 
GHz common carrier bands, respectively.’° In this paper, we focus on 
the design and performance features of a high-speed (approximately 
22.5-MHz) synchronous transversal equalizer. A theoretical develop- 
ment of equalization principles is specifically omitted since those 
points are amply discussed in the technical literature (for example, 
see Chapter 6 of Ref. 11). 


Il. GENERAL DESCRIPTION 
2.1 DR 6/DR 11 radio system 


Figure 1 functionally depicts the DR 6/DR 11 digital radio system. 
Two independent, 45-Mb/s random data streams are differentially 
encoded to form two rails, each with four-level amplitude states, and 
then modulated in quadrature to form a 16-QAM, 90-Mb/s IF signal 
at 70 MHz. The Radio-Frequency (RF) transmitter modulates the IF 
signal up to 6 or 11 GHz for transmission over a line-of-sight terrestrial 
path to the digital receiver. At the receiver the signal is down-converted 
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Fig. 1—DR 6/DR 11 digital radio system. 
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Fig. 2—Baseband receiver with adaptive transversal equalizer. 
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to IF, where it is processed by an Automatic Gain Control (AGC) 
amplifier and adaptive slope equalizer. The baseband receiver de- 
modulates the IF signal into in-phase (I) and quadrature (Q) rails, 
where the baseband data states are detected and estimates of the 
original transmitted data are made. An error detector provides for 
system performance monitoring. 

Figure 2 functionally illustrates the baseband receiver. As described 
above, the 90-Mb/s, QAM IF signal is demodulated into I and Q rails, 
each 45 Mb/s. After conventional half-Nyquist spectral shaping (using 
delay-equalized analog filters with raised-cosine shaping and 45-per- 
cent roll-off), the four-level signals enter the transversal equalizer for 
removal of the linear intersymbol interference previously generated 
by multipath propagation, imperfect Nyquist filtering, etc. After 
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baseband equalization, the transmitted symbol states are estimated at 
the decision point, and the decoded binary signals are used in subse- 
quent digital processing. 


2.2 Adaptive transversal equalizer 


To remove in-rail and cross-rail intersymbol distortion, two adaptive 
transversal filters (each with five complex-valued tap weights) are 
configured for baseband equalization of QAM signals. The selection 
of five taps is based on theoretical studies of equalizer performance as 
a function of equalizer length. For example, Amitay and Greenstein’? 
have investigated the multipath outage performance of digital radio 
receivers using finite-length adaptive equalizers. Using Rummler’s 
statistical description of multipath channels,’® equalizer performance 
for a broad ensemble of fading scenarios was simulated. Their study 
indicated that five synchronous taps considerably reduce ISI relative 
to performance attained with three taps and that equalizers with seven 
or more taps, while obviously further reducing ISI, exhibit a rapidly 
diminishing relative reduction in linear distortion. (Independently, 
Murase et al.,/4 and Takenaka et al.!° have also selected five-tap filters 
for their transversal equalizer designs.) 

The equalizer tapped-delay lines are fabricated using lumped-delay 
elements isolated with buffer amplifiers. The buffer amplifiers are 
Hybrid Integrated Circuits (HICs) and provide high isolation between 
the delay line and coefficient-weighting taps. Tap weighting is accom- 
plished with variable gain amplifiers. These, too, are hybrid integrated 
circuits fabricated in single in-line packages, thereby permitting high- 
density electronics on each circuit board. Summing amplifiers (also 
HICs) then add the individual tap-weighted signals to form the filter 
output. 

The coefficient control portion of the equalizer uses zero-forcing 
adaptation and is implemented with high-speed Emitter-Coupled 
Logic (ECL). The control circuit accepts error polarity and estimated- 
symbol polarity from the in-phase and quadrature decision circuits. 
Appropriately delayed versions of these bits are then correlated during 
each symbol period using exclusive OR gates. The time-averaged 
values of these correlations determine the weight of each tap in the 
two transversal filters. Time averaging is achieved using operational 
amplifier filters optimized for the trade-off between coefficient noise 
and dynamic multipath tracking ability. 

The entire equalizer consists of three 1-inch plug-in circuit packs in 
a format compatible with the DR 6/DR 11 terminal or regenerative 
equipment. A photograph of these circuit packs appears in Fig. 3. Two 
of these packs are identical analog transversal equalizers, one for in- 
phase and cross-rail equalization of the I rail, the other for similar 
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Fig. 3—Adaptive transversal equalizer consisting of two transversal filter circuit 
packs and one zero-forcing control circuit pack. 


equalization of the Q rail. Equalizer coefficient control is generated in 
the third circuit package. 


Il. EQUALIZER PERFORMANCE 
3.1 Theoretical performance 
3.1.1 Reduction of peak distortion 


As we noted above, five-tap synchronous transversal equalizers are 
theoretically capable of substantially reducing intersymbol interfer- 
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ence caused by frequency-selective fading. For zero-forcing coefficient 
adaptation, the peak distortion, D,, of the corrupted digital signal is 
minimized."’ (As used here, D, is equivalent to Peak Eye Closure 
(PEC) for binary transmission.) Representative theoretical perform- 
ance is illustrated in Fig. 4. In Fig. 4 we consider one digital rail (I or 
Q) and show the variation of peak distortion—with and without 
equalization—of a digital signal for a 20-dB fade notch depth as a 
function of notch position in a +18-MHz channel. (Ideally, distortion 
in the other rail would be identical.) The ordinate on the right side of 
this figure provides the corresponding peak eye closure for a four-level 
signal, given by PEC = D,(L — 1), where L is the number of discrete 
transmitted symbol states on each rail. This illustrative fade grossly 
closes the digital eye with D, > 1 over at least a portion of the channel 
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Fig. 4—Theoretical reduction in peak distortion, D,, with a five-tap QAM transversal 


equalizer arrangement. Peak eye closure is noted for four-level transmission, that is, 
one rail of a 16-QAM system. 
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bandwidth. This latter condition highlights an analytic limitation of 
zero-forcing equalization: if D, > 1, the coefficient set may be subop- 
timal.!’ In spite of this, other analysis (to be discussed shortly) and 
our own measured data show that adaptive transversal equalizers do, 
in fact, notably reduce intersymbol interference in just such an envi- 
ronment. Moreover, zero-forcing is known to assure a global minimum 
if D, < 1, affords comparative ease of circuit realization, and minimizes 
Bit Error Rate (BER) in the high signal-to-noise ratio that typifies 
quiescent digital radio operation. The other dominant adaptation 
approach for automatic equalizers, Least-Mean-Square (LMS) algo- 
rithmic control, is more difficult to realize in high-speed circuits and 
has a proclivity for unsatisfactory local minima when used in the 
decision-directed mode.*® 


3.1.2 Equipment signatures 


Equipment signatures!”"* provide a particularly meaningful measure 


of equalization capability since they can be directly related to outage 
predictions for digital radio systems. The signatures are 10°? BER 
contours: at each point on the contour, the fade notch depth corre- 
sponding to a 10-° BER (defined as a digital radio outage) is specified 
as a function of notch position for a fixed-delay statistical model of 
multipath propagation. Figure 5 presents theoretical signatures com- 
puted by M. H. Meyers’? for no equalization, slope equalization, and 
transversal equalization. Figures 4 and 5 confirm that five-tap trans- 
versal equalizers theoretically provide a significant reduction in linear 
distortion. Indeed, even the use of zero-forcing control for fades with 
D, > 1 yields a degree of equalization that mitigates digital radio 
outage. The data of Fig. 5 indicate that a fade notch depth as shallow 
as 7 dB can cause an outage in the absence of countermeasures. When 
the radio receiver is equipped with a transversal equalizer, outages are 
not experienced until the notch depth reaches approximately 23 dB, 
which can occur for an unequalized D, > 1, as shown in Fig. 4. Also 
observe from Fig. 5 that slope equalizer performance is influenced by 
the minimum or nonminimum phase character of the fade, as we 
mentioned earlier. This is not a limitation of transversal equalization. 


3.2 Measured laboratory performance 
3.2.1 Equipment signatures 


The definition and significance of equipment signatures were pre- 
viously mentioned. The laboratory measurement of these signatures 
is facilitated through the use of a new computer-controlled multipath 
fade simulator that continuously varies notch depth and notch fre- 
quency to achieve a 10°? BER. The simulator is inserted in the IF 
path of the receiver just before the AGC amplifier (see Fig. 1). 
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Fig. 5—DR 6 theoretical equipment signatures for 16-QAM digital radio. Performance 
for radio without equalization, with an adaptive slope equalizer, and with a five-tap 
synchronous transversal equalizer using zero-forcing control. 


Signatures were measured using two DR 6 receivers, the first 
equipped with an adaptive slope equalizer (the standard arrangement) 
and the second equipped with both the adaptive slope equalizer and a 
five-tap adaptive transversal equalizer. The 10? BER minimum phase 
and nonminimum phase equipment signatures appear in Fig. 6. As the 
data reveal, the adaptive slope equalizer performs best when used for 
minimum phase fades, with a performance deterioration experienced 
for nonminimum phase fades. We commented earlier that IF equalizers 
typically double delay distortion during nonminimum phase fading, 
and this effect can degrade equipment signature performance. The 
same effect naturally occurs when the adaptive slope and synchronous 
transversal equalizers are used together. Comparing both sets of 
curves, however, we also observe the significant improvement in 
equipment signature performance that can be ascribed to the trans- 
versal equalizer alone. 

The relative reduction in digital radio outage time is estimated using 
a prescription described by Meyers,” wherein the areas under equip- 
ment signature contours, with and without transversal equalization, 
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Fig. 6—DR 6 measured equipment signatures for 16-QAM digital radio. Performance 
for radio with adaptive slope equalization and adaptive slope and transversal equaliza- 
tion for 6.3-ns path delay. (Adapted from Ref. 20.) 


are compared. The predicted relative outage reduction factor, derived 
from theoretical equipment signatures for combined ideal slope and 
transversal equalization (see Fig. 5) is 5. This assumes equally probable 
minimum and nonminimum phase fading. The predicted relative 
outage reduction factor for the measured equipment signatures (see 
Fig. 6) is 4.5, again assuming equally probable minimum and non- 
minimum phase fading. The relative reduction factors for other ratios 
of minimum to nonminimum phase fading range from 4 to 5. The 
measurements in Fig. 6 attest to the quality of the transversal equalizer 
circuit design itself. Regarding this point, baseband implementation 
of the equalizer permits integration of substantial portions of the 
circuitry, thus simplifying design and manufacture. The development 
of new carrier and timing recovery circuits also helps place laboratory 
performance close to the theoretical limit shown in Fig. 5. 


3.2.2 Simulation of dynamic fading 


An important aspect of multipath propagation is its rapid temporal 
variation. To assure optimal equipment performance in the field, 
dynamic (time-varying) tests were performed during the development 
phase. Dynamic multipath fading is produced in the laboratory by 
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controlling the continuously variable fade simulator with a microcom- 
puter. Realistic time sequences of multipath behavior were pro- 
grammed into the simulator. Equalizer performance was monitored 
during the simulation of these dynamic fades, thereby permitting 
optimization of the equalizer timing-recovery, carrier-recovery, and 
coefficient-updating loop parameters. 

Several aspects of an equalizer’s response to dynamic multipath 
propagation are exercised with the following test sequence (schemat- 
ically depicted in Fig. 7): starting with a shallow fade notch depth d, 
at a particular notch frequency f,, the notch depth increases at a rate 
$; until a notch depth d, is reached. The notch then sweeps across a 
band of frequencies from f; to f. at a rate so. At the notch frequency 
fe, the notch depth decreases from dz back to d,; at a rate s3. This 
fading trajectory retraces itself and is repeated several times for 
statistical averaging of the receiver’s error performance. A test se- 
quence like this tests the receiver’s ability to track notch depth and 
notch frequency dynamics. For trajectory parameters of d; = 6 db, de 
= 15 dB, s; = s3 = 9 dB/s, f, = —12 MHz (12 MHz below the IF 
frequency), fe = +12 MHz, and s, = 24 MHz/s, the transversal equalizer 
consistently operates error free. Those test velocities are also faster 
than 90 percent of all observed notch depth and notch position rates 
of change reported by Sakagami et al.”” 


3.3 Field evaluation 


The adaptive transversal equalizer was installed in a DR 6-30 field 
test facility at Palmetto, Georgia, on June 4, 1982. This modified 
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Fig. 7—Test sequence for dynamic simulation of multipath propagation. 
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Fig. 8—Time-below-level propagation data for Palmetto, Georgia, in 1982. 


baseband receiver was compared with a standard DR 6 receiver 
(equipped with an adaptive slope equalizer) during a multipath season 
from June 6 to November 6, 1982. Propagation data collected during 
the field evaluation period are shown in Fig. 8.”° The abscissa of this 
figure reports fade notch depth; the ordinate indicates time faded 
below the respective abscissa value. A considerable amount of fading 
exhibits notch depths in excess of 10 dB, which, from Fig. 5, could 
correspond to an outage in the absence of suitable countermeasures. 
The two baseband receivers shared the same RF and IF front ends. 
Field measurements, monitored by AT&T Bell Laboratories personnel 
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from Merrimack Valley, were grouped into 11 two-week intervals. In 
Fig. 9, the number of seconds for which BER > 10° is presented for 
both receivers for each of the two-week intervals. Also presented is 
the ratio of these two time measurements, representing a composite 
improvement factor attributable to the transversal equalizer, alone. 
Figure 10 presents similar data for a BER > 10~*. In Fig. 11 we show 
the incidence of frame loss with and without the equalizer, as well as 
the corresponding reduction in loss of frame. 

For the 22 weeks represented in Figs. 9 through 11, the adaptive 
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Fig. 9—Field performance for BER > 10°. 
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Fig. 10—Field performance for BER > 10~*. 


transversal equalizer provided composite improvement factors of 3.6 
for BER > 10~°, 3.7 for BER > 10~+, and 2.9 for frame loss. The 107* 
BER improvement factor of 3.7 is only 20 percent below the predicted 
improvement factor of 4.5, based on laboratory-measured equipment 
signatures. 


IV. CONCLUSIONS 


Because of their ability to adaptively equalize multipath-induced 
amplitude and delay distortion for minimum and nonminimum phase 


DIGITAL RADIO 1459 


450 
400 YA WITHOUT 
y TRANSVERSAL 
VA EQUALIZER 
350 
WITH 
300 TRANSVERSAL 
EQUALIZER 


250 


NUMBER OF SECONDS OF FRAME LOSS 


y 
y 
4 





IMPROVEMENT FACTOR 





6-6 6-20 7-4 7-18 8-1 8-15 8-29 
TIME (MONTH-DAY) 


9-12 9-26 10-10 10-24 


Fig. 11—Field performance for frame loss. 


fading, synchronous transversal equalizers promise to play an impor- 
tant role as a multipath countermeasure for terrestrial digital micro- 
wave networks. 

In this paper we summarize the major design and performance 
features of a five-tap analog transversal equalizer for the baseband 
receivers of two 16-QAM, 90-Mb/s digital radio systems. The equal- 
izers heavily rely on HIC technology for their tapped-delay line buffer 
amplifiers, tap-weighting coefficients, and summing circuitry. The 
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zero-forcing adaptation portion of the equalizer is realized with high- 
speed ECL logic. The entire equalizer is packaged in three 1-inch plug- 
in circuit packs. 

During design, the equalizer was tested for its static equipment 
signature performance and dynamic tracking capability. The latter 
evaluation was facilitated with a special-purpose, computer-controlled 
multipath fade simulator. During a 22-week field trial evaluation in 
Palmetto, Georgia, the equalizer reduced the overall incidence of DR 
6-30 radio outage by more than a factor of 3. System estimates indicate 
that this improvement factor could eliminate the need for space- 
diversity reception on more than 50 percent of the short-haul digital 
radio hops that currently use it. Use of the baseband adaptive trans- 
versal equalizer thus can provide considerable cost savings. 
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Adaptive Differential Pulse Code Modulation (ADPCM) systems can pro- 
vide high-quality digitizations of telephone-bandwidth speech at a bit rate of 
32 kb/s. At a lower bit rate such as 24 kb/s, the quality of the speech is limited 
by an easily perceptible level of quantization noise. This paper proposes an 
adaptive postfiltering procedure that can significantly enhance the quality of 
lower bit rate ADPCM. The coefficients of the postfilter are easily derivable 
from the predictor coefficients in the ADPCM decoder. In a subjective test 
involving 18 listeners and two sentence-length test inputs, the enhanced 
24-kb/s speech with an optimized postfilter design ranks very close to conven- 
tional 32-kb/s speech. A suggested application of the postfiltering procedure 
is in packet voice or mobile radio systems where substandard bit rates such as 
24 kb/s or 16 kb/s are sometimes necessary. The postfiltering algorithm has 
also been successfully tested in non-DPCM situations, such as in the enhance- 
ment of speech degraded by additive white Gaussian noise. 


I. INTRODUCTION 


Recent algorithms for adaptive prediction’ and adaptive 
quantization? have led to the realization of high-quality ADPCM 
systems at 32 kb/s. This bit rate is the result of 8-kHz sampling and 
quantization using 4 bits/sample. The quality of 24-kb/s speech using 
the same prediction algorithm and 3-bits/sample coding is limited by 
a clearly perceptible level of quantization noise. This paper proposes 
a very simply implemented postfiltering algorithm, which provides a 
significant enhancement of 24-kb/s quality. In a subjective test to be 
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described later in this paper, the enhanced 24-kb/s ADPCM system 
was ranked very close to conventional 32-kb/s ADPCM. 

A natural application of the postfiltering procedure would be in 
variable bit rate ADPCM systems such as packet networks or mobile 
radio where substandard bit rates such as 3 or 2 bits/sample are 
occasionally encountered. The postfiltering technique described in this 
paper is particularly effective at the bit rate of 3 bits/sample. It is also 
effective in non-DPCM situations such as in the enhancement of 
speech degraded by additive white Gaussian noise. When the signal- 
to-noise ratio (s/n) at the input to the postfilter is too low (as in 2-bit 
ADPCM or with white Gaussian noise at a relative noise level exceed- 
ing approximately —3 dB), noise suppression can only be achieved at 
the expense of severe distortion of the speech signal itself. When the 
24-kb/s ADPCM is enhanced, the introduction of speech distortion is 
perceptible, but the effect of noise reduction is by far the more 
dominant phenomenon. 


Il. A SEMIQUANTITATIVE EXPLANATION OF THE POSTFILTERING 
TECHNIQUE 


The philosophy of the postfiltering technique is represented in Fig. 
1. Part (a) of the figure shows a signal spectrum with two narrowband 
components in the frequency regions W, and Ws, and a flat noise 
spectrum that is 15 dB below the first signal component but 5 dB 
above the second signal component. An ideal postfilter for this situa- 
tion would have a gain of unity (0 dB) in the regions W; and W2 and 
a gain of zero (— © dB) in the rest of the frequency range. In real 
speech applications, implementation of such all-or-none responses is 
impractical except in the special cases where the stopband regions of 
the postfilter are merged into a single contiguous frequency region as 
in a low-pass or high-pass postfilter.** 

A more practical approach, proposed in this paper, is the use of a 
postfilter frequency response that peaks in the regions W; and Wo, 
but is significantly lower in the rest of the frequency range. Figure 1b 
illustrates an extreme example of this approach. Here, the transfer 
function of the postfilter is chosen to be identical to the input signal 
spectrum in Fig. la. The resulting spectra of postfiltered signal and 
postfiltered noise preserve the original signal-to-noise ratios of 15 dB 
and —5 dB in the regions W, and W2, respectively. However, the noise 
in the rest of the illustrated frequency range is now much lower, 
relative to the signal levels, than in part (a) of the figure. Specifically, 
the signal-to-background-noise ratios for regions W, and W2 are now 
45 dB and 10 dB, in place of 15 dB and —5 dB in the absence of 
postfiltering. This suppression of background noise also implies that 
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Fig. 1—An idealized explanation of the effects of postfiltering, assuming a signal 
with two narrowband components and a noise spectrum that is white. (a) Signal and 
noise spectra at the input to the postfilter, showing signal-to-noise ratios of 15 dB and 
—5 dB in signal frequency bands W, and W.. (b) Spectra of postfiltered signal and 
postfiltered noise, assuming a postfilter transfer function identical to the signal spectrum 
in (a). Regions W; and W, continue to have local signal-to-noise ratios of 15 dB and —5 
dB as in (a), but the signals are now 45 dB and 10 dB above the out-of-band noise level. 
In (a) the corresponding numbers are only 15 dB and —5 dB. The overall effect is a 
reduction of perceived noise, but the price paid is a change in the relative strengths of 
the signal components in W, and W2. 


the residual noise spectrum after postfiltering is very similar to the 
input signal spectrum itself. In speech applications, noise that is 
shaped in this manner tends to be perceived as speech. 

A postfiltering operation such as that in Fig. 1b provides a significant 
amount of signal enhancement for the reasons just described. It should 
be noted, however, that such a postfilter also distorts the signal. For 
example, the difference in signal levels in the regions W,; and W, has 
been distorted, from 20 dB in Fig. 1a to 35 dB in Fig. 1b. The 
postfiltering technique to be described in the next section provides a 
controlled exchange between signal distortion and noise suppression. 
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In the applications discussed, the technique realizes a broad range of 
postfilter design over which the phenomenon of noise suppression 
dominates the phenomenon of signal distortion. 

Although speech enhancement’ is an “ancient” art, we believe that 
the adaptive postfiltering technique discussed in the next section is 
novel. It can be used as a very general technique for speech enhance- 
ment. It can also be used very effectively in the specific context of 
ADPCM noise. The coefficients of the proposed postfilter are inspired 
by the coefficients of the adaptive predictor in ADPCM coding, and 
are in fact very closely related to these coefficients. 


III. POSTFILTERING OF ADPCM SPEECH 


Figures 2 and 3 provide block diagram descriptions of ADPCM with 
adaptive postfiltering. 

Figure 2 shows the decoder part of the system. Broken lines in the 
figure refer to parts of the system that compute the coefficients of the 
adaptive predictor and the adaptive postfilter. The coefficients used 
in the postfilter are differently scaled versions of the coefficients used 
in the adaptive predictor. These coefficients are already available in 
conventional ADPCM. In the case of a system with Backward-Adap- 
tive Prediction (APB), the predictor coefficients are updated in gra- 
dient-search algorithms driven by a recent history of the input and 
output of the ADPCM decoder. 

A more complete block diagram of the ADPCM system appears in 
Fig. 3. The quantizer and predictor assumed in this paper are both 


INPUT TO CONVENTIONAL ENHANCED 
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Fig. 2—Adaptive postfiltering of the output of an ADPCM decoder. The coefficients 
of the postfilter are scaled versions of the coefficients of the adaptive predictor in 
DPCM. In DPCM-APB, the predictor coefficients are obtained on the basis of obser- 
vations of a recent history of decoder input and decoder output. The parts of the circuit 
that determine coefficient values are shown by broken lines. 
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backward-adaptive devices, implying that no special side information 
needs to be explicitly transmitted to the ADPCM decoder to enable 
adaptations of quantizer step size and predictor coefficients. 

The adaptive quantizer assumed in this paper is one based on the 
use of a one-word memory,® but the results of this paper are fully 
expected to extend to a system that may use the more generalized 
version of this quantizer, as described in Ref. 2. In the quantizer used 
in this paper, the ratio of maximum step size to minimum step size is 
512, and the minimum step size is in the order of 27! times the peak- 
to-peak value of the speech input x(n). 

This adaptive predictor we assumed is a pole-zero predictor, similar 
to that in Ref. 1. As Fig. (8) shows, the predicted value x(n) of input 
x(n) is a combination of two components, the outputs £,(n) and x,(n) 
of an all-zero predictor B(z) and an all-pole predictor A(z). Formally, 


£(n) = £,(n) + Xp(n) 


2 
£p(n) = ¥ aj(n)y(n — j) 


j=1 


6 
é,(n) = ¥ b(n)u(n — J), 


j=l 
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Fig. 3—Complete block diagram of an ADPCM system with a pole-zero predictor 
[defined by all-zero and all-pole components A(z) and B(z)] and a pole-zero postfilter 
[defined by components A’(z) and B’(z) that are derived from A(z) and B(z)]. The 
extreme case of A’(z) = B’(z) = 0 results in conventional ADPCM without postfiltering. 
The case of A’(z) = A(z) and B’(z) = B(z) results in a postfilter transfer function that 
is identical to the input signal spectrum, as in Fig. 1b. 
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where u(n) is the quantized version Q[-] of the prediction error and 
y(n) is the reconstructed output: 


u(n) = Q[x(n) — x(n)] 
y(n) = x(n) + u(n). 


Adaptation of the predictor coefficients a; and 6; follow the updating 
algorithms’ 


aj(n) = Aja;(n — 1) + wysgn[u(n — 1)]sgn[y(n — 1 — 7)] 
J = 1, As Mi = 511/512; de = 255/256; M1 = po = 0.008 
and 
b(n) = Ajbj(n — 1) + wjsgn[u(n — 1)]sgn[u(n — 1 — 7)] 
j = 1 to 6; Aj = 255/256 and yw; = 0.008 for all 7. 


The coefficients of the all-pole predictor are further controlled, for 
stability reasons, by the following constraints: 


— 0.75 S ap < 0.97 
| Qimax | = 0.97 — a2; | a | = min{ lai|, | Qi,max | } 


a, = |a,|sgn ay. 


3.1 Coefficients of the postfilter 

A good starting point for designing the postfilter is the frequency 
response of the inverse predictor. This is the system whose input and 
output are the innovations u(n) and the reconstruction y(n). Its 
transfer function, derivable from linear equations that relate u(n), 
x(n), and y(n), is 


Y(z) _ 1+ B(z) 
U(z) 1-—A(z)’ 





where 


2 6 
A(z) = Yaz?; Biz) = Y Bz”. 
j=l j=l 
The speech-like transfer function of Fig. 1b is approximated if the 
postfilter response is identical to the function [Y(z)]/[U(z)]. This is 
because the spectrum of the quantized innovations u(n) is approxi- 
mately white and that of the reconstruction y(n) is hopefully an 
approximation to that of the input x(n). More generally, as in Fig. 3, 
we propose a postfilter transfer function 


7 1 + B’(z) 


He) 1 — A’(z)’ 
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where 
2 


6 
A'(z) = ¥ aalz7; = B’(z) = ¥ bp/z7 
jl 


j=l 
O0O<a<1; and 0<681. 


The extreme situation of Fig. 1b is approximated when a = 6 = 1. 
In practice, this approximation can be quite poor because of the effects 
of nonideal predictor adaptation, usually resulting in an inverse pre- 
dictor transfer function that is a flattened version of the input speech 
spectrum, with poles and zeros that may also be significantly shifted 
from their original locations. The case of a = 6 = 0 corresponds to 
conventional ADPCM without any postfiltering. As we discuss in the 
next section, intermediate designs provide different mixes of noise 
suppression and speech distortion. 

Figure 4 shows an illustrative spectrum of input speech and com- 
pares it with the transfer functions F(z) for (a = 0.2; 8 = 1.0) and 
(a = 1.0; 8 = 1.0). The latter condition simply corresponds to the 
transfer function of the inverse predictor. 


IV. EXPERIMENTAL RESULTS WITH ADPCM SPEECH 
The speech inputs used in the experiment were the sentence-length 


RELATIVE POWER IN DECIBELS 





FREQUENCY IN KILOHERTZ 


Fig. 4—(a) Input speech spectrum; and power transfer functions of postfilter with 
scaling coefficients for (b) a = 0.2, 8 = 1.0, and for (c) a = 1.0, 6 = 1.0. The plot (c) is 
merely the transfer function of the inverse predictor in the DPCM-APB system. [The 
0-dB line is the same for (b) and (c) but different for (a)]. 
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utterances “The Lathe is a big tool” and “The chairman cast three 
votes,” bandlimited to 3.2 kHz in each case and sampled at 8 kHz. 
These inputs will be referred to as L8 and C8, respectively. 


4.1 Signal-to-noise ratio results 


Figures la and 1b indicate that postfiltering can result in significant 
improvements in s/n. Table I further demonstrates this for the ex- 
amples of 3-bit and 2-bit DPCM. The results tabulated are the values 
of the s/n at the input of the postfilter and the s/n after postfiltering. 
(See Fig. 3.) Table I also shows corresponding values of the segmental 
s/n. In the ranges 0 < a <1 and 0 <8 <1 for the coefficient scaling 
factors, the greatest gains in the s/n are obtained when a = 6 = 1. 
These gains are seen to be as high as 8.9 dB for both L8 and C8. The 
gains of the s/n at the input of the postfilter are always lower for the 
design a = 0.2 and 6 = 1.0. But we presently note that these settings 
of a and @ provide a subjectively desirable design. 


4.2 Subjective results 


Tables II and III provide the results of a subjective test involving 
14 listeners, including 9 from the AT&T Bell Laboratories Acoustics 
Research department and 5 listeners who had no prior exposure to 
speech coding experimentation or testing. A total of eight stimuli were 
included in the test. These included 4-bit ADPCM without postfilter- 
ing, 38-bit ADPCM with six postfiltering conditions (including the no- 
postfiltering case of a = 6 = 0), and 4-bit ADPCM with 6-kHz sampling 
and a substandard speech bandwith of W = 2.6 kHz. This last condi- 
tion was included to provide a 4-bit, 24-kb/s alternative to the 3-bit 
ADPCM stimuli, all of which also had a bit rate of 24 kb/s. The values 
of a and B used in the test were selected on the basis of a pilot test 


Table |—Values of s/n at input of postfilter and after postfiltering 
(see Fig. 3). Numbers in parentheses are corresponding values of 
segmental s/n ratio 


s/n After 
Postfiltering (dB) 
s/n at fe ee ey et 
R (bits/ Input of a= 1.0 a = 0.2 
Input sample) Postfiltering B=1.0 B=1.0 
21.7 28.9 27.0 
L8 3 (23.6) (32:5) (28.1) 
2 15.2 20.9 19.5 
(17.1) (25.3) (21.0) 
18.9 27.2 23.9 
C8 8 (20.7) (29.6) (24.6) 
9 12.7 20.0 17.1 
(14.9) (22.9) (18.2) 
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Table II—Number of wins in a round-robin tournament involving 
eight coding conditions and four listeners, where the maximum 
possible score is 196 for any given coding condition 


0,0 0.2,10 04,08 04,06 0.6,0.6 0.6, 0.4 
75 118 116 104 113 106 
80 129 120 106 117 102 





Table !/I—Rank ordering of coding conditions by the group of 14 
listeners and by a subgroup of 9 listeners from the Acoustics 
Research Department 







Sample 
























a, B 0,0 0.2,1.0 04,08 0.4,0.6 0.6,0.6 0.6, 0.4 

L8 (G14)" 2 3 6 4 5 8 
L8 (G9) 3 2 6 4 5 8 
C8 (G14) 1 2 5 3 6 8 
C8 (G9) 1 3 5 4 6 8 





* G14 group of 14 listeners. 
* G9 group of 9 listeners. 


that identified the interesting ranges of these parameters from the 
point of view of perceived mixes of noise suppression and speech 
distortion. 

In general, use of postfiltering results in an amplification of the 
speech signal as suggested in Fig. 1b. The postfiltered speech stimuli 
were therefore appropriately scaled down to mitigate differences in 
stimulus loudness. 

The subjective test involved an exhaustive pairwise comparison of 
all possible stimulus pairs, with each pair appearing at random places 
in the test once in each possible order of presentation. The total 
number of AB comparisons was therefore 768 (8 X 7 = 56 possible 
stimulus pairs for each of 14 listeners). 

Table II shows, separately for inputs L8 and C8, the total number 
of wins of each stimulus, with a maximum possible score of 196 for 
each stimulus [a maximum score of 2-(8 — 1) for each of 14 listen- 
ers]. It is seen that the worst two coding conditions stand apart from 
the rest. These conditions are 3-bit ADPCM with no prefiltering and 
4-bit ADPCM with 2.6-kHz bandwidth speech input (and output). 
This latter condition gets a particularly low total score. Table II also 
shows that the above results are not very different for the inputs L8 
and C8. 

Table III shows, separately for inputs L8 and C8, the rank ordering 
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of the eight coding conditions in the subjective test. Results are shown 
separately for the total group of 14 listeners and the group of 9 listeners 
from the Acoustics Research department. It is seen that the rankings 
are not significantly different for the two populations. The best setting 
of the coefficient scaling parameters is defined by 


a = 0.2; B=1.0 


in each case, and for both L8 and C8. With input L8, 3-bit ADPCM 
postfiltered as above is ranked a close second to 4-bit ADPCM speech 
of equal bandwidth. In fact, the 4-bit ADPCM coder is ranked only 
fourth when the results of all 14 listeners are pooled together. The 
second and third ranks in this category belong to postfilters with the 
design (a = 0.4; 8 = 0.8) and (a = 0.6; 6 = 0.6). The preference for the 
design (a = 0.2; 8 = 1.0) has a simple interpretation. It suggests a 
postfilter transfer function that mimics the approximate speech spec- 
trum (the inverse predictor function) very closely at the zeros of that 
spectrum (8 = 1.0), but very loosely at the poles (a = 0.2). This 
suggests a condition that seeks to maximize background noise sup- 
pression and minimized perceived speech distortion. In the case of 
voiced speech segments, the poles tend to correspond to formant 
frequencies and the value of a = 0.2 prevents an undue emphasis of 
the higher-amplitude spectral peaks, a situation that was indeed 
encountered in the example of Fig. 1b. 


4.3 Enhancement of 2-bit ADPCM 


As we see in Table I, the s/n gains due to postfiltering are equally 
significant for both 3-bit ADPCM and 2-bit ADPCM. Perceptually, 
however, the general noise level in 2-bit ADPCM speech is such that 
a useful degree of noise suppression requires the design of (a = 1.0; 8 
= 1.0). With this design, the speech distortion introduced by the 
postfilter is also substantial. For this reason, the case of 2-bit ADPCM 
is not considered to be of sufficient practical importance to pursue 
formal subjective testing. Informal testing shows, however, that the 
design of (a = 1.0; 8 = 1.0) is again preferable to conventional 2-bit 
ADPCM (a = 0; 6 = 0). 


V. ENHANCEMENT OF SPEECH DEGRADED BY ADDITIVE WHITE 

GAUSSIAN NOISE 

The specific postfiltering algorithm of Fig. 1b (a = 1.0; 6 = 1.0) was 
also applied to speech degraded by additive white Gaussian noise. The 
input speech was L8, the speech-to-noise ratios ranged from —3 dB to 
17 dB, and all coefficients were obtained simply by simulating the 
easily available case of 5-bit ADPCM, a bit rate high enough to 
introduce very little quantization noise in comparison with the levels 
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of Gaussian noise being studied. We find the postfiltering algorithm 
provides a very useful enhancement of noisy speech if the input to the 
postfilter had a s/n of at least +3 dB. For lower values of s/n, 
postfiltering provides noise suppression, but at the cost of substantial 
distortion of the speech itself. 
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The purpose of this paper is to discuss theoretical, as well as psychophysical, 
aspects of using the Itakura-Saito type of measures for evaluating the quality 
of coded speech. We present psychoacoustic interpretations of the measures 
and identify their effectiveness as well as limitations within the theoretical 
framework of a generalized waveform coder distortion model. The discussions 
then point out some specific issues to be resolved through psychoacoustic 
research effort. 


l. INTRODUCTION 


A “good” speech quality measure is central to progress in the 
research and development of speech processing systems. In speech 
coding, for example, we need a quality measure to provide insight into 
different distortions that are present in a coder output. If such a 
measure existed, it would help speech researchers identify how various 
kinds of distortions could be traded in order to improve the perceptual 
performance of the speech coder. In an engineering context, a measure 
that indicates the perceptual quality is a criterion to be optimized in 
speech coder design. Without such a measure, tuning coding schemes 
to achieve optimal quality is not a trivial task and the performance 
cannot be conveniently evaluated. 

Speech quality assessment, however, involves subjective, psycholog- 
ical attributes of human perception, an area in which mathematicians 
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and engineers are usually not well versed. Thus, speech quality eval- 
uation has never been established satisfactorily in mathematical terms. 
The conventional signal-to-noise ratio (s/n), widely used in character- 
izing signal transmission/reception environments, is an ineffective 
measure of speech quality. Several other measurement methods and 
parameters, such as the isopreference method! and the subjective 
s/n,” have been proposed during the last two decades. General surveys 
of classical approaches can be found in Refs. 1 and 3. Reference 2 and 
its references also provide a summary of past efforts. Among these 
approaches, one particular class of measures based upon the Itakura- 
Saito measure has attracted engineers and scientists taking an analyt- 
ical approach toward the problem. The Itakura-Saito measure and its 
variations, such as the Itakura or log likelihood ratio measure* and 
the likelihood ratio measure,’ have been employed in noise studies by 
Sambur and Jayant;® in vocoder designs by Juang et al.’ and Wong et 
al.;3 in automatic speech recognition by Itakura® and Rabiner;° and 
as quality measures by Goodman et al.,!' Crochiere et al.,!? and 
Barnwell et al.!° 

Although successful applications of this class of measure are wide- 
spread in speech processing, none of them comes close to being justified 
as the speech quality measure. This paper attempts to identify the 
effectiveness as well as limitations of using this class of measure for 
speech quality within the theoretical framework of a generalized 
waveform coder distortion model.“ We will further point out that 
such limitations also exist in current automatic speech recognizers 
that rely upon spectral matching. We then present some considera- 
tions relating to psychoacoustic studies, aiming at a better understand- 
ing of the fundamental concepts of speech quality in the presence of 
spectral distortion. These considerations will help direct future rele- 
vant psychoacoustic experiments for studying the dynamics of speech 
perception. 


I]. PRELIMINARIES 


Let s(i) and s’(i) be two sampled speech signals, and let x,(i) and 
x/,(i) be two windowed segments, or frames, of s(i) and s’(i), respec- 
tively. Segments x,(i) and x;,(i) are obtained by applying a window 
function w(t), with w(t) = 0 fori <0 andi 2 N, to the speech signals 
at instance n; in particular, 


Xn(t) = w(t)s(i + n) (1) 
and 
x(t) = w(t)s’(i + n). (2) 


The windowing operation greatly facilitates using spectral represen- 
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tations for speech analysis because speech is considered as a quasi- 
stationary signal. We denote the z-transform of x,(i) and x,(i) by 
X,(z) and X;(z), respectively. The Fourier transform is obtained by 
evaluating the z-transform on the unit circle, i.e., z = e’”, and thus the 
notations X,,(e”) and X/(e/”) are used to designate the Fourier trans- 
-form of two windowed signals, respectively. For every such pair of 
spectral representations, X,(e/”) and X/(e/”), a spectral distortion 
p[Xn, X/] can be defined to measure the dissimilarity between X,,(e’”) 
and X/(e?*). In speech analysis, one particularly interesting distortion 
measure is the Itakura-Saito measure, which is defined as 


pislXn, Xr] & i [eM = Mo) US, (3) 


where 
A(w) = log | X,(e%*) |? — log | Xz(e’”) |?. (4) 


This mathematically tractable distortion measure has been success- 
fully employed in vocoder designs.’ Detailed analytical properties of 
the measure can be found in Refs. 4 and 5. 

It has been shown in short-time Fourier analysis that a signal can 
be reconstructed from a properly time-sampled sequence of short-time 
Fourier transforms.’® We can, thus, further represent the two signal 
sequences, s(i) and s’(1), by their corresponding short-time spectral 
sequences. Using © to denote the reconstruction process, 


s()} = .. © Xe-ail2) © Xu(2) © Xineyi(2) @ .. = © Xnlz), 5) 


and 


{s’(i)} = .. @ X(n-i(z) ® Xnu(z) ® Xi+i(z) ® .. = ® Xz (z). (6) 


In the above / is the underlying interval for short-time Fourier analysis 
and has been dropped in the final expressions without ambiguity. Such 
a representation allows us to characterize the dissimilarity between 
s(i) and s’(i) in terms of distortion measures obtained from short- 
time spectral representations. A distortion sequence between two 
speech signals is then defined as 


pls(z), s’(t)] = {en}, (7) 


where n is, as in (1) and (2), the frame index designating the window 
location, and 


Pn = p[Xn, Xn]. 


We will call p, spectral distortion and {p,} a distortion sequence. 


SPEECH CODER PERFORMANCE 1479 


Extending the definition (3) to (7), then, we have a sequence of 
Itakura-Saito distortions. 

The Itakura-Saito distortion measure defined by (3) and (4) is in 
fact the distortion measure for all-pole signal modeling; it was origi- 
nally introduced as an error-matching function in maximum likelihood 
estimation of autoregressive spectral models.!’ Therefore, we shall 
confine ourselves to the analysis of Mth-order all-pole signal models 
despite the fact that a distortion measure could be more general. 
Several important results of the measure related to all-pole signal 
modeling are: 


1. pislXny On/An] = (an/o2) + log o% — log an. — 1, (8) 
where 
. . oy 1g AW 
an A | Xr (e? )-An(e?*) | ay (9) 
1 Qn 
Qn, 0 4 exp J log | X,(e?*) |? ae ’ (10) 
—T vis 


o, is a scalar, called the gain term, and 
M . 
A,(z) =1+ ¥ ajn2z™ (11) 
i=l 


2. prs[on/An, on/An] 


jw) |2 
a =. oS us + log of? — log o2 — 1, (12) 


which reduces to 
| A;(e?*) |? dw _ 


pislOn/An; On/An] = { 1A, (e) |? Qa on 


= pis[1/An, 1/An] (13) 


when the gain terms are identical. A;,(z) takes the same form as A,(z) 
in (11). In the above expressions, we have assumed that A,(z) and 
A(z) have all their roots within the unit circle. Therefore,’® 


Tv d § . 
if log| An(e™) |? 5 = { log| Az(e*) |? 2 = 
Fu tT Jos Qn 


For clarity, we further define the likelihood ratio measure and the 
log likelihood ratio (or Itakura) measure as follows: 


1. Likelihood ratio measure’ pzr[X,, Xh], 


1 1 
pirLXn, Xn) 4 = PIs Iz. ae ’ (14) 
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2. Log likelihood ratio (Itakura) measure,* 


. a bead 
eiXn, Xn] = PIs = ’ A} 


= log () ; (15) 


In defining the above two measures, A,(z) and A,(z) are the optimal 
Mth-order inverse filters of X,(z) and X/(z), respectively.'® Further- 
more, 








an = { | Xn(e’*)Ai(e”*) |? 2 ’ (16) 
-r ae Qn 
and 
: jw jw\ 2 dw 
On = | X,(e?*)A,(e’*) |" —. (17) 
_ -r = Qn 


Note that a, is the minimum Mth-order prediction residual energy 
pertaining to signal X,,(z). 

The Itakura-Saito distortion between the input and output signals 
of a linear system H(e’“) can be easily calculated. Denoting the input 
power spectrum as | X,(e’”)|?, we have the output power spectrum 
| X7,(e%*) |? = | X(e*)H(e**) |?. Therefore, 





N(w) = log| X,(e”*) |? — log| Xn(e’*)H(e’*) |? 
= — log] H(e") |’, (18) 
and hence, 
pis|Xn, Xn] = [ Gon + log| H(e”*) |? — 1 =. (19) 
Of particular interest here is a class of H(e?”) of the form 
H(e*) = oe (20) 
B,(e’*) 


where A,,(z), as defined above, is the optimal Mth-order inverse filter 
of X,(z) and B,(z) is another Mth-order Finite Impulse Response 
(FIR) filter, taking the same form as (11). We also assume that A,(z) 
and B,,(z) both have all their roots within the unit circle. The input/ 
output relationship of the system is illustrated in Fig. 1. Since A,(z) 
is the optimal Mth-order inverse filter of X,(z), E,(z) is then the 
residual signal. X;,(z) is obtained by driving another all-pole filter 1/ 
B,(z) with such a residual signal. The distortion between X,(z) and 


SPEECH CODER PERFORMANCE 1481 


x, E X‘plz) 


Fig. 1—A particular class of linear system in which A,(z) is the optimal Mth-order 
inverse filter of X,(z). 


X,(z) under this condition is thus 


" | B,(e?”) |? dw 
Xn, X¥] = { Ca 
prs| ] _ | A,(e”*) 2 on 


_ 1 1 

= pis A,” B, ; (21) 
which is determined by the two all-pole filters, and has the same 
expression as the likelihood ratio measure of (14). This result gives us 
a convenient means of modifying a signal in order to achieve a 
prescribed distortion level from the original signal. Detailed discus- 
sions in Section IV are based upon this concept. It is, however, 
important to note that in eq. (21), B,(z) is not unique, and is not 
necessarily the optimal Mth-order inverse filter of the output signal 
X/(z). It is simply stated that within the Mth-order autoregressive 
model framework, a prescribed Itakura-Saito spectral distortion can 
be obtained from a given signal through proper filtering operations, 
which will be convenient to realize. 


Ill. A WAVEFORM CODER MODEL 


Figure 2 shows a block diagram of the waveform coder distortion 
model used by Crochiere et al. for an interpretation of the log likelihood 
ratio measure.” This coder distortion model is composed of a time- 
varying linear filter h(i), to model the “linearly correlated” distortions, 
and an additive noise source q(i), to account for the nonlinear, uncor- 
related distortions in the coder. Since the model attempts to split the 
components of distortion, it was expected that distinctively different 


i —A WAVEFORM CODER coats 





x(i) x'(/)} 





aii) 


Fig. 2—Waveform coder distortion model. 
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q(i) | far 


= ee 


Fig. 3—Measuring coder performance with the likelihood ratio in a forward manner. 


perceptual effects could be meaningfully studied separately with such 
a model. 

Measurement of the coder performance with the likelihood ratio 
measure is shown in Fig. 3, which introduces the notion of inverse 
filtering. We use the likelihood ratio measure, rather than the Itakura- 
Saito measure, because we try to avoid, in the following discussions, 
extra complications in speech quality measurement due to amplifica- 
tion or attenuation. We follow the notation of Section II, except that 
the subscript indicating the frame index has been dropped, since, for 
most of the subsequent expressions, signal stationarity is assumed. 
We shall reinstate the frame index wherever necessary. The two 
parameters, a and a’, are defined as in (17) by 


a= [ ixeyaey?@ (22) 
aa -7 a Qn 
and 
, : , iw , iw 2 dw 
a= px (e")A"(e?*) |e, (23) 
_ -—1 = Qn 


where A(z) and A’(z) are the optimal Mth-order inverse filters of X(z) 
and X’(z), respectively. In other words, a and a’ are the minimum 
Mth-order prediction residual energies corresponding to x(i) and x’ (i) 
sequences, respectively. The energy of v’(i), denoted by 8’, is then 


7 ; , d. 
p= [ pxemare ye ®. (24) 
a -1 ; Tv 


The energy of w’(i), on the other hand, is unity due to the normali- 
zation factor 1/ Va’ and eq. (23). The energy ratio of the two filtered 
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signals, v’(z) and u’(i), is then equal to @’. By substituting (22) into 
(24), we have 


[i ixenarey pe & 

—7 vin 

2 (25) 
[i ixeate?& 
—7 vie 


The right-hand side of eq. (25) is the so-called likelihood ratio, and it 


can be reduced to 
= T | A’(e%*) |? dw 
es ‘x |A(e*) |? 2m 2) 
since both A(z) and A’(z) are Mth-order FIR filters, and the first 
M + 1 autocorrelation coefficients of the {[x(i)]/Va } sequence are 
equal to those of the impulse response of 1/A(z). Therefore, the 


likelihood ratio measure of (14) can be expressed in terms of the energy 
ratio of the two filtered outputs, v’(i) and u’(i), and 


1 |A’(e’*) |? dw 
pua(X, X18 prs|2, 2] = pvoote- 


=p. <1: (27) 


Alternatively, we may replace the filter A’(z) by A(z), the inverse 
filter of the x(i) sequence, as shown in Fig. 4. In such a case, the 
energy of u(i), denoted by y, is the likelihood ratio, and the distortion 


v(i) 







lull = 
veri =1 


u(/) 


qi) | Jar 
eee ae 


Fig. 4—Measuring coder performance with the likelihood ratio in a backward manner. 
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measurement is accomplished in a reversed direction, i.e., 


|A(e”*) |? dw | 
piR(X’, X) A ps [2,4] = fio (ei) |? or 


=y7-1. (28) 


The interpretation of the log likelihood ratio as a coder performance 
measure by Crochiere et al. follows the comparison order of eq. (28).2” 
More specifically, the measure they discussed was log vy, instead of 
+ — 1. The difference between the log likelihood ratio measure and 
the likelihood ratio measure may be insignificant in terms of measure- 
ment. However, the likelihood ratio measure of eq. (14) appears to 
correspond more closely to the Itakura-Saito measure in representing 
the distortion relationship between the input and output signals of a 
particular class of linear systems. This was shown in eq. (21). 

We now express the measures within the coder model. Referring to 
Fig. 2 and denoting the Fourier transforms of h(i) and q(i) by H(e?“) 
and Q(e’*), respectively, we have 


X'(el*) = X(e!*)H(e*) + Q(e’”). (29) 
Furthermore, since x(i) and q(i) are uncorrelated, 
| X’(e*) |? = | Xe”) H(e”) |? + | Q(e’*) [?. (30) 


For simplicity we assume that H(z) does not have poles and zeros on 
the unit circle. From (24), the likelihood ratio distortion measured 
from {x(i)} to {x’(z)} is thus 


prr[X, X’] = B’ — 1 


Ale) 
eer conaikel) 


— 1 ate) |) 2-1. (31) 


On the other hand, the distortion measured from {x’(i)} to {x(i)} is 
pir[X’, XY] =y—-1 


~ z iM {| X(e%)H(e’*) |? + | Q(e*) |9j 
| A(e**) |? ve =% (32) 


Note that Q(e~“) is the complex conjugate of Q(e’”) since q(i) is real. 
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Different distortion components of the coder may be decoupled in 


the following way: 
1. Additive noise distortion, p,, is defined when there is no corre- 


lated spectral distortion, i.e., 


eS? = prra[X, X’] | Hei \=1 


I: ae ; ; 
at | 1A”(o!)[2(1X"()|2 — 1 Qe) | 2-1 
A Vn 2a 


a pee i: 1A(e)Q(e) |? 2 (38) 
a a Jy Qa 


and 


p? = prra[X’, X] |e |=1 


a | | A(e?*) |? {| X(e/*) |? + |e) |} 52 ; 


14 [aera ®. (34) 
a’ J_, 


IR lis 


In the above, the superscripts, f and b, denote the forward and 
backward measurements, respectively. 

2. Correlated spectral distortion, p,, is defined when the additive 
noise component vanishes, i.e., 


ps? © prrlX, X’] | atei#)=0 
al { | X’(e*)A’(e%*) |? dw 
ads = |H(e’*) |? ar : = 


and 


ps” & prr[X’, X]| aeiy=0 


a1 Pixma 2-1. 6) 
a’ Jy = 2a 


The above decomposition of the measure into additive noise and 
correlated spectral distortions provides a helpful means in cross- 
verification between the measure and many known perceptual attri- 
butes. In the following we shall discuss the merits as well as limitations 
of the above measure in measuring the perceptual quality of waveform- 
coded speech signals. Such discussions point to some necessary psy- 
chophysical experiments for a closer link between objective and sub- 
jective measures. 
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3.1 Additive noise distortion 


The key contribution of the uncorrelated additive noise, q(t), ap- 
pears in the integral terms in (33) and (34). Let us consider (34), where 
the integrand involves the inverse filter A(z) for the input speech 
signal. 

The integral 


[i iacemees 
—r T 


is minimized subject to the constraint 


[ia 2= P, (37) 


where P, is a constant, when A(z) is the optimal (Mth-order) inverse 
filter of the g(i) sequence. In other words, for a given noise power, the 
integral is minimized if the noise has the same spectral shape as the 
input speech, within the Mth-order autoregressive signal modeling 
framework. This appears to be in very good agreement with the results 
of auditory masking that has been proposed as a method for improving 
the perceived quality of digitally encoded speech.’®”° The same obser- 
vation can also be made on (33), where the integrand involves A’ (z) 
instead of A(z). A’(z) is the optimal Mth-order inverse filter of the 
encoded output sequence x’(t). If q(t) is truly uncorrelated with x(i) 
(recall that | H(e/”) | = 1 here) and has the same spectral shape as 
x(i), then A’(z) is, in fact, identical to A(z). However, when exact 
shaping of noise spectra is not achievable (as in most practical coder 
systems), (83) and (34) lead to significantly different distortion mea- 
surements since a’ involves A’(e/”), which demonstrates attributes of 
Q(e’”). The following example illustrates the difference between the 
forward and the backward measurements. 

Consider two signals, one being tonelike and the other being white 
noise. These two signals are represented in terms of second-order all- 
pole models as 1/A,(z) and 1/A,,(z), where 


A(z) = 1 — 1.2726 27! + 0.81 27? (38) 
and 
A; (z) = 1, (39) 


The two roots of A;,(z) are 0.9 e*/*, which indicate a resonance at 
a/4 normalized frequency or at 1000 Hz when the sampling frequency 
is 8000 Hz. These two all-pole models have corresponding reflection 
coefficient vectors k, and k,,:'° 


ki = [Rr Rio] = [—0.7 0.81] (40) 
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and 
ky = [Rw1kw2] = [0 0]. (41) 
Using eq. (7) of Ref. 7, 


1a — (Rwr — Rer)?(1 + Ru)? 
PENA As (1 — k4)(1 — k2) 
(Rye =a Rye)? 
amy par a 


we can easily calculate the distortion in each direction and obtain 


(42) 


1 1 
PLR F ; i = 4.7 (43) 
and 
1 1 
PLR tf ’ +] = 2.26. ; (44) 


Clearly, if measured in the forward direction, when an input tonelike 
signal is being distorted into white noise, the distortion is higher than 
vice versa. The result is reversed if the distortion is measured in the 
backward direction; that is, distorting an input noise signal into a 
tonelike signal will result in a more serious objective distortion mea- 
surement than distorting a tone-like signal into white noise. Previous 
studies in auditory masking demonstrated a similar asymmetry of 
masking between tone and noise.”"~” In particular, it has been reported 
that noise masks a tone more effectively than a tone masks noise. A 
1-kHz tone masked by noise that is one critical band wide typically is 
inaudible at a signal-to-masker ratio of —4 dB, while the corresponding 
ratio for noise signal masked by tone is approximately —24 dB. In 
other words, it is easier to perceive noise in a tone than it is to perceive 
a tone in noise. For an objective measure to consistently predict the 
perceived quality, we thus would require that such a measure show 
higher distortion when the input tone is corrupted by noise and that 
it show lower distortion when input noise is distorted by an additive 
tone signal. Despite the slight difference between masking and distor- 
tion, forward measurements of (33) thus appear to be more justifiable. 
More rigorous psychoacoustic studies are obviously very important in 
carefully resolving this measurement direction issue. 


3.2 Correlated spectral distortion 


Compared to additive noise, correlated spectral distortion has not 
been as well studied in the past, but it is a key factor affecting the 
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perceived quality. One well-known example is that “telephone speech”, 
which is essentially bandlimited to the range of 200 to 3200 Hz, is 
considered to be of poorer quality and of lower intelligibility than the 
unfiltered original speech. Since correlated spectral distortion can be 
a result of the filtering operation, we shall discuss it using linear 
filtering concepts. 

Linear systems can be categorized into time-invariant and time- 
variant systems. Accordingly, correlated spectral distortion can be 
time-invariant or time-variant as demonstrated in eq. (19), where the 
correspondence between the filtering operation and the distortion 
measure was established. The above-mentioned bandpass filtered 
speech signals, such as telephone speech, have essentially a time- 
invariant spectral distortion (here we are not considering tone noise, 
clicks, or channel variations, etc.), while Linear Predictive Coding 
(LPC) vocoders involve many time-variant spectral distortions, as will 
be discussed shortly. 

The use of the Itakura-Saito type of measure for time-invariant 
spectral distortions, such as (3), (14) and (15), appears to be justifiable, 
at least within the short-time frame boundary where stationarity is 
reasonably assumed. This can be seen from the application of the 
likelihood ratio measure in vector quantization for voice coding.”* In 
fact, the code words designed for vector quantization using (14) are 
substantially consistent with the vowel triangle of Peterson and Bar- 
ney from an acoustic-phonetic point of view.® It also has been shown 
that the log likelihood ratio measure usually leads to a better recog- 
nition rate in speech recognition schemes.’°”> (Note that the log 
likelihood ratio and the likelihood ratio measure make no significant 
difference in most speech recognition applications. The only theoret- 
ical difference is in template generation where minimization of some 
criterion, such as the average distortion or maximum distortion, is 
required.) For interests in psychoacoustic studies, however, it may be 
desirable to further translate the measurement into a perceptual scale 
that better interprets the relative perceived quality. (The complication 
here is the possible sound dependence on a perceptual scale. Consider 
the following example. Suppose X has been distorted, resulting in Y 
and Z. We can confidently say Y sound is closer to X sound than Z 
sound is, if p[X, Y] < pLX, Z]. However, we are not sure that Y is 
perceptually closer to X than Z is to W, even if p[X, Y] < p[Z, W].) 

Beyond the short-time stationary segment level, the time-variant 
distortion is a more important and complicated factor to consider in 
speech processing. Spectral distortion measures are defined for every 
pair of spectral representations. A natural extension of the distortion 
measure for measuring dissimilarity between time-varying signals is 
thus the distortion sequence is expressed by (7). Previous experiments 
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and several reported results that help illustrate the effect of time- 
variant spectral distortions upon speech quality are in order. 

Voice coding results in time-variant spectral distortion. The key 
contribution to the time variation of distortion in voice coding such 
as LPC is a result of parameter quantization, although the parameter 
analysis procedure itself may also introduce some time-variant distor- 
tion because of frame alignment, change in excitation, etc. The effect 
of such distortion thus can be best explained in performance compar- 
ison of different parameter quantization schemes. 

The experiment in Ref. 7 that compared the distortion performance 
of vector and scalar quantization for LPC voice coding provides 
important insights in this regard. In order to conduct the so-called 
equal average distortion comparison in the experiment, speech signals 
were vocoded at a lower bit rate with vector quantization and at a 
higher bit rate with scalar quantization. Subjective comparison of 
these two sets of synthesized signals of equal average distortion showed 
that the vector quantization synthesis samples sounded smoother and 
more pleasant, and were considered of better quality. Substantial 
background warble was perceived in the scalar quantization samples. 
Differences in spectral continuity, distortion contour, and some sta- 
tistics of the distortion process {p,} between the two sets of synthesis 
samples were then reported to explain the difference in the perceived 
synthesis quality. It was concluded that a coder that preserves more 
spectral continuity, achieves smoother distortion contour, and pro- 
duces less divergent distortion statistics is better than a coder with 
otherwise different distortion performance, even though they yield the 
same average distortion. Vector quantizers appear to produce “better” 
distortion sequences than do scalar quantizers in LPC voice coding. 
The importance of considering the distortion as a process or sequence 
(instead of just an average distortion) and of looking into the spectral 
continuity (a mathematical definition of which has yet to be obtained) 
was thus highlighted. 

The concepts of time-variant distortions and spectral continuity 
also raise a possible explanation for the experimental results of Tri- 
bolet et al.?° Here, performances of four different types of waveform 
coders at three different bit rates were compared. An average noise- 
to-signal measure [eq. (2) of Ref. 26], 4,, which was derived through 
the concepts of log likelihood ratios, was used as an objective measure 
to predict the subject performance. As seen from Figs. 5 and 6 (dupli- 
cated from Figs. 7 and 9a of Ref. 26), the main failures of the likelihood- 
ratio-derived measure are in predicting the performance of all coders, 
at 9.6 kb/s [in particular, Sub-band Coder (SBC) at 9.6 kb/s] and 
Adaptive Differential PCM (ADPCM) coder with a fixed predictor at 
24 kb/s. At 9.6 kb/s all coders perform subjectively worse than objec- 
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Fig. 5—Quality median rating of 12 coders (65 listeners by 4 talkers). 
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Fig. 6-—Objective noise-to-signal measure, 4,, averaged over 16 articulation bands for 
the 12 coders in Fig. 6. 
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tively predicted. At 9.6 kb/s, SBC is objectively very close to the 
Adaptive Transform Coder (ATC) but turns out to be subjectively 
even worse than the ADPCM with a variable predictor. At 24 kb/s, 
ADPCM with fixed predictor is objectively much worse than ADPCM 
with variable predictor, but they in fact are subjectively very close. 
These failures can be attributed to the fact that 4, does not correctly 
consider the correlated spectral distortion, and more importantly, it is 
only an average over the entire speech sample, revealing no informa- 
tion on possible perceptual degradation due to time-variant spectral 
distortions. The outcome that all coders perform subjectively worse 
than objectively predicted at lower bit rates is probably a result of 
increased sporadic spectral distortions and reduced spectral continuity 
along the time axis. Sub-band coding schemes inherently preserve less 
spectral continuity at lower bit rate, and thus it is possible that 
relatively more quality degradation is perceived at 9.6 kb/s with SBC. 
Finally, the ADPCM coder with adaptive, variable predictor poten- 
tially introduces more spectral discontinuity, due to quantization of 
the predictor parameters, than does the ADPCM with a fixed, un- 
quantized predictor. 

To illustrate this, plots of the log spectral (eighth-order all-pole) 
difference between the original and the reconstructed speech signals 
are shown in Fig. 7. Coders used in Fig. 7 are ADPCM with fixed 
predictors and adaptive predictors, respectively. More spectral discon- 
tinuity is observed in the adaptive predictor case, particularly in the 
low frequency region. Therefore, even though adaptive predictors yield 
higher prediction gain than fixed predictors,”’ this objective advantage 
has been subjectively offset by the perceptual sensitivity to time- 
variant distortions, particularly at higher bit rates, where the effect of 
additive noise becomes relatively less significant. As a result, the 
subjective performance gap between the two coders is substantially 
reduced. 

Similar limitations apply to automatic speech recognition schemes 
that use one single average or accumulative figure to represent the 
dissimilarity between the spectral sequences of the input speech and 
the stored reference template. In parallel with the concept of meas- 
uring speech quality with the segmental s/n, recognition schemes 
usually resort to segmentation and time warping in order to obtain 
better distortion or distance measurements for more accurate recog- 
nition decisions. Nevertheless, segmentation schemes produce hard 
segmental boundaries, instead of natural, soft transitions, and are 
never completely reliable. The original problem of measuring the 
dissimilarity between time-varying signals thus has never been entirely 
solved. 
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Fig. 7—Log spectral difference between the original and reconstructed signals: (a) 
with a fixed, unquantized predictor; (b) with an adaptive, quantized predictor. 
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The above considerations clearly point out the necessity of psycho- 
physical experiments for developing a better speech quality measure. 
Specifically, with regard to using the Itakura-Saito type of measures, 
issues to be further studied are: the measurement direction, the feasi- 
bility of characterizing subjective quality by distortion sequences, and 
the incorporation of some transitive functions into the distortion 
measure to account for spectral continuity. In light of the analytical 
features of the Itakura-Saito type of measures, research on these issues 
appears to be vitally important to an analytical speech perception 
model. 


IV. SOME CONSIDERATIONS IN FUTURE PSYCHOPHYSICAL STUDIES 


It is beyond the scope of this paper to propose and discuss in detail 
the psychophysical experiment procedures necessary to answer al] the 
questions above. It is, however, appropriate to address one of the 
difficulties in psychoacoustic experiment designs here. In addition, we 
shall propose to consider a class of transitive functions to be used in 
defining the spectral continuity measure. 


4.1 Inverse filtering as a tool 


One of the fundamental difficulties in designing psychoacoustic 
experiments is the control of test stimuli. How to characterize and 
contro] the test signals is obviously not a simple matter when the 
stimuli are real running speech signals. In parallel with this problem 
is the difficulty in defining a refined speech production model that, at 
least, adequately describes the real speech production mechanism. 

In studying perceptual responses to various spectral distortions in 
order to better analytically and dynamically characterize speech qual- 
ity with the Itakura-Saito measure, the difficulty fortunately can be 
greatly alleviated. In particular, the result of (21) allows us to modify 
a speech signal conveniently to meet the prescribed distortion require- 
ments. Clearly, if 


{s’(t)} = ® X,(z) - 








= O50 
er 
then 
als(i), s’(i)] = {os ale (46) 
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Fig. 8—Signal modification procedures to achieve prescribed distortion characteris- 
tics. 
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Figure 8 illustrates the modification procedure. The speech signal is 
first inverse filtered by A,(z) to obtain the residual E,,(z), which then 
drives the chosen filter 1/B,(z) to form the desired signal. 

Choosing 1/B,,(z) such that 


ds ae 
PIS Aa Bs = Pp, 


a prescribed value can be made simple if we have a good-sized vector 
code book, as designed in vector quantization.’ The search for 1/B,,(z) 
is then quantum-selectively finite, although there are theoretically 
infinite number of all-pole filters. Also, the test stimuli designed 
according to (45) are free from excitation variations, such as funda- 
mental frequency changes, that are better considered separately. 
4.2 Spectral continuity 
As discussed above, spectral continuity is an important factor af- 
fecting the perceived quality of speech signals. Speech signals carry 
distinctive time-frequency or spectral transition patterns. Phonetic 
manifestation in articulated speech signals could be very fast, like 
/str/ in “strange”, or sustainingly slow, like /i/ in “eat”. To avoid 
complications due to such an inherent nonuniform spectral change, 
Ref. 7 used the model error spectral sequence {A,(w)}, defined by 
An(w) = log 


— log (47) 


1 1 

| An(e’*) |? | A,(e?*) |?’ 
where A,,(z) is a quantized version of A,,(z), to illustrate the difference 
of the ability of various quantization schemes in preserving spectral 
continuity. The rationale was based upon the fact that the ultimate 
spectral continuity to be retained is the inherent spectral transition 
pattern, and that if a coder produces spectral distortion that is inde- 
pendent of time, that is, 


An(w) = A(w) for all n, (48) 


then the time-variant spectral distortion is completely eliminated. 
While {A,,(w)} adequately explained the spectral continuity differences, 
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more rigorous alternatives are necessary for, at least, the following 
reasons: (1) the variation in A,(w) along the frequency axis, w, as well 
as the time axis, n, is often so substantial that it is difficult to use 
only (47) to define a spectral continuity measure; (2) it was never 
concluded that the change in A,,(w) along the time axis, if regarded as 
an indication of spectral smoothness, is indeed perceptually independ- 
ent of the spectral transition pattern of the speech signal. 

Before we can completely characterize the spectral continuity along 
both the frequency and time axis, we would like to propose to tenta- 
tively consider two transitive functions that indicate the spectral 
changes in a speech signal as a function of time. The notion of eq. 
(21), measuring the distortion between two all-pole spectra, is empha- 
sized in defining such transitive functions. Denoted by ¢,(k), the 
forward transitive function is defined by 


= 1 1 
A ny ees Pe 
ok) A x eps FE : oa ; (49) 


where ); is a time constant and, A,(z) and Az-,(z) are the optimal 
Mth-order inverse filters of X;,(z) and X;-,(z), respectively. ¢,(k) 
measures the all-pole spectral change in the speech signal in a forward 
manner, i.e., it measures the distortion resulting from replacing the 
current spectral envelope with previous spectral envelops. Character- 
istic changes in excitation, such as the pitch inflection, are not actively 
considered in ¢;(k), although they may affect the estimation of all- 
pole spectral models. One interpretation of measuring the transition 
in speech by the distortion between all-pole models instead of speech 
spectra is that we try to keep the current excitation signal unchanged, 
as if it were present in the previous segments as implied by eq. (21). 
We also assume that the time constant );, accounting for short-time 
auditory memory,” is independent of the particular sound that is 
articulated and perceived. 
Similarly, we define the backward transitive function ¢,(k) as 


z= af 1 
A —npy we, —|. 0 
d(k) & 2 eps Fe ; z| (50) 


Note that if the distortion measure were symmetrical and if dy = As, 
the two transitive functions would be identical. The appropriateness 
of these functions remains to be studied. 

The transitive functions are to be regarded as part of the speech 
signal. When a speech signal is distorted because of processing or 
encoding, the corresponding transitive functions are distorted also. 
The distortion, or noise, in the transitive functions thus provides a 
measure of the time-variant spectral distortion that affects the spectral 
continuity in the original signal. Further research effort, of course, is 
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necessary to verify the suitability of these functions or to develop a 
better spectral continuity measure. We feel that the concept in (49) 
and (50) provides a good starting point. 


V. CONCLUSION 


While the Itakura-Saito distortion measure and its variations have 
been widely employed and are considered promising in characterizing 
speech quality,’ limitations in such measures still exist and have been 
identified within the theoretical framework of a generalized waveform 
coder distortion model in the above discussion. This type of measure 
is inherently nonsymmetric and therefore, in measuring distortions, a 
proper measurement direction needs to be determined. Subjective 
quality evaluation involves perceptual response to various degrees of 
distortion that has to be considered as a time function or a stochastic 
process. The feasibility of describing the subjective quality by finite- 
order statistics of the distortion process is to be studied. Furthermore, 
evidence shows that speech spectral continuity is also a key, if not the 
most important, factor affecting the subjective quality and thus, the 
speech spectral transition pattern should be regarded as a vital part 
of the speech signal. An even more fundamental and difficult task is, 
then, the incorporation of the spectral transition patterns into the 
rather static measurements of the Itakura-Saito distortion. Psychoa- 
coustic studies are necessary to resolve these issues. 
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We propose a switch, suitable for an integrated local communications 
network, that will support packet switching and circuit switching, with a wide 
range of bit rates. Key components are two serial memories; a multiplicity of 
access units, each capable of writing and reading uniformly formatted, ad- 
dressed information; and a programmed controller. Circuit switching is 
achieved when the controller repeatedly allocates memory slots, following call 
setup. Data communications can proceed concurrently without setup, compet- 
ing for unused slots. We give an example of a 10,000-telephone-line switch 
carrying a similar load of other traffic. The switch would delay voice by less 
than 5 ms and could be interfaced to the existing telephone system. We 
indicate a method of fault detection and isolation that will limit the impact of 
a failure on a serial memory to an arbitrarily small group of connected lines. 
We define an index for measuring failure impact and use it to derive most- 
favorable fault-isolating partitions. 


I. INTRODUCTION 


The telephone system is by far the world’s largest communications 
network. It was primarily designed for voice, but its role widens 
continuously, as it adapts to new requirements. Presently it is changing 
to accommodate data communications. 

Already the network extensively caters to data communications, but 
not yet as well as it might. Although internally the telephone system 
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is rapidly becoming a vast interconnected computer system, proffered 
data are still largely carried as analog signals externally. That will 
change, however, as special provisions for data come on-line. As more 
of the plant, including switches, becomes digital, it will be possible to 
offer, on a selective basis, switched digital telephone channels usable 
for 56-kb/s data throughput. Also, packet-switched data services will 
widen in scope and access. Packet-switched data services are overlay 
networks that use the digital transmission facilities of the telephone 
network but bypass its switches, of which many are still analog. The 
packet networks eventually may become totally interconnected, just 
as the voice network, and also may become integrated with it. 

In-house, or proprietary, telephone networks can benefit from the 
changing character of the overall network more immediately. Already 
available are switches and other components that permit an all-digital 
network that will accommodate on one facility both voice and data. 
As good as this already is, we are proposing a switch that could make 
the private network even better. Eventually it might even influence 
the entire system. 

Currently available switches provide only circuit-switched connec- 
tions. This gives fixed-capacity channels on a continuous basis, 
whereas much of data comes in bursts. Thus, computer communica- 
tions are characterized by very long call durations with only low 
average, but in many instances very high, peak rates. Given the option, 
direct memory transfers could proceed in some instances at rates of 
many megabits per second. This is far too high for a switched and 
continuously held circuit. 

It is true that the needs of bursty traffic can be catered to by what 
already is available, namely by some packet-switched networks. But 
that introduces a separate communications network for data, with the 
consequences of proliferating wiring plans, divided responsibilities, 
and probable long-term dyseconomies. It is better for one facility to 
serve all communications, and to do so without imposing mismatches. 

We propose a switch and, more generally, a new switch architecture 
that support within one switching fabric both circuit- and packet- 
switched connections. This would largely avoid mismatches in respect 
to bursty data traffic, while preserving unity in communications. 

The cardinal components of the switch (see Fig. 1) are a pair of 
Serial Memories (SMs), a Central Controller (CC), and Accessing 
Units (AUs). The memories do not recirculate and both ends (head 
and tail) of each terminate on the central controller. The AUs are 
connected to read-and-write taps along the SMs, an AU having one 
connecting tap to each memory. The two taps of an AU form a 
symmetrical pair: the tap to the second memory is as many places 
from the tail end as that to the first is from the head. Thus, each AU 
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Fig. 1—Block schematic of switch. Access Units (AUs) communicate via two Serial 
Memories (SMs) on behalf of client Stations (Sts). Central Controller (CC) reserves 
slots for circuit-switched communications. 


can reach every other AU by either one, or the other, memory. It can © 
reach, and be reached by, the central controller by either memory. All 
writing is logical OR. 

An AU acts as an agent of a client station (St) (e.g., telephone, 
facsimile terminal, computer) and mediates communications between 
it and other stations by way of corresponding AUs. Communications 
are carried on by write-and-reads in memory/time slots of uniform 
length and format. Each slot consists of a data field and several control 
fields. Collectively, the control fields provide synchronization, “Slot 
busy” indication, source and destination addressing, and slot pleading. 
A circuit-switched communication is carried on in regularly recurring 
slots, which are appropriately premarked by the central controller. 
For a packet-switched communication an AU simply uses the next 
available slot. 

The capacity of a connected circuit can vary over a wide range, from 
a small fraction of a single (64-kb/s) telephone channel up to a large 
multiple of that capacity. It is settled by negotiation with the CC at 
the time of circuit setup, and need not be the same on different 
occasions. The capacity available to a packet-switched communication 
depends on the prevailing competition and can be any portion of the 
total switch capacity. The latter is a function of size and would be just 
several megabits per second for a 100-line switch and several hundred 
megabits per second for a switch that supports 10,000 lines. 
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A simple realization of the serial memories would be by clocked 
shift registers. The shift registers can be bit-paralleled to any degree 
needed to keep the clock rate low. The memories and all access units 
can be located centrally at the controller, with all connections to the 
switch then forming a single star. But it is also possible to segment 
the memories and form the network in clusters. The segments of 
memory would be connected serially to each other and to the central 
controller by transmission lines, forming two contrary rings, and the 
clusters again would form star topologies. 

All elements of our proposal are well established and tried. Central, 
or stored program, control in circuit switching is over twenty years 
old.'! The idea of switching by time slot interchanges is even older,” 
followed shortly by its realization through read-and-writes in computer 
memory.** Packet switching is more recent,° but is also well estab- 
lished both in local and wide-area networks.®® 

In essence, our scheme is an adaptation of seemingly diverse pro- 
cedures, so that they may coexist. Time division slots are enlarged 
from what is usual in circuit switching, so that they can carry the 
control information essential to packet switching. Unlike normal 
packet-switched schemes, packets are of a single fixed length so that 
they can also be circuit-switched. Instead of separate time and space 
division stages, common in current telephone switches, we have a 
combined space/time fabric, abstracted from ring and bus networks, 
with a particular debt to Fasnet.? This makes packet switching possible 
without controller intervention. Finally, the controller maintains cir- 
cuit connections by repetitive slot allocations, which is only marginally 
different from what takes place in a time division stage of a standard 
switch. 

Also, our proposal is not first in its suggestion that voice and data 
be integrated on a common network.’ But it appears to be first in 
suggesting a common switch for circuits and packets as the basis for 
that integration. With few exceptions,” prior suggestions have been 
to treat voice as data and to packet-switch it both in local and wide- 
area networks. However, these proposals have attendant delays that 
have to be addressed. 

The point is important since, in the global telephone system, trans- 
mission delays can limit the quality of many possible connections. In 
the case of our switch, the delay of voice signals can be kept to less 
than 5 ms. It depends only on the clock rate, the size of switch, and 
the size of slots. Since delay considerations have an overriding sway 
on system choices, we discuss them in Section II. In Section III we 
give further details of our proposal. 

In Section IV we address the question of reliability in our switch. 
We do this because our proposal may be seen as being particularly 
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vulnerable, since all its communications are to take place via two 
serial memories to which all AUs have writing privileges. We introduce 
a scheme for sectional detection and isolation of faults applicable to 
our switch. We show that this would limit the impacts of faults in our 
case to those that would prevail in switches that have much more 
dispersed and/or redundant architectures. 


Il. DELAY CONSTRAINTS AND RATE REQUIREMENTS 


A communication system is expected to deliver messages to the 
destination in a timely fashion. The permitted delay is different in 
character for data and for real-time signals. We review the two cases 
separately. 


2.1 Data transmission 


Within limits, the exact times of arrival at the destination of the 
different parts in a data stream generally are unimportant. It is usually 
required that the sequence in the stream be preserved and that the 
average delay does not exceed some specified value. When data are 
presented for transmission at a fluctuating rate and there is not 
sufficient transmission capacity to cope with the peaks, the flow is 
smoothed by buffering. Waiting times in buffer stores are the predom- 
inant cause of delay.’° 


2.2 Real-time signals 


In the transmission of real-time signals, the delay should be a 
constant and not greater than a specified value. Given fluctuation in 
transmission rate, there will be a time-varying delay W,(t) in a buffer 
at the sending end. A further delay W,(t) must be deliberately intro- 
duced in a buffer at the receiver,!® so that the total delay could stay 
constant: 


W, + W, =D. (1) 


D is the fixed buffer delay with which the system has been designed. 

If, at some time, W, exceeds D, then, at the same time, the buffer 
at the receiver will become empty and there will be a break in the 
received signal. Hence, there is no point in storing more at the 
transmitter than the amount of data that represents the total designed 
delay. If the rate \ of the real-time data is constant, as in Pulse Code 
Modulation (PCM) voice, then waiting times are directly related to 
amounts of stored data. The buffer-store capacities, N, and N,, that 
need to be provided at the two ends are equal, given by 


N, = N, = XD. (2) 
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If X is not constant, then the required capacities are still equal and are 
found by substituting the maximum value of ) in eq. (2). 

It is important to note that, given fluctuations in data and/or 
transmission rate(s) and buffer stores to smooth them, the relevant 
delay for real-time signals is the maximum, i.e., designed, value, not 
the statistical average. How much larger that designed delay is to be 
than the average depends on the actual fluctuations in rate(s) and the 
relative tolerance to lost quality by signal discontinuities and by delay. 


Ill. DESCRIPTION OF SWITCH 


We now detail several aspects of the proposed switch. We give an 
indication of architectural options, describe protocols, and suggest 
suitable parameters for a 10,000-line switch. 


3.1 Architecture 


The basic configuration of the switch was shown in Fig. 1 and 
outlined in the Introduction. The functions of the AUs and the CC 
will be defined in more detail when we discuss protocols in the next 
subsection. It will be seen that there are considerable differences in 
the tasks of an AU that is mediating a circuit-switched, as compared 
to packet-switched, communication. Further differences in speed and 
buffer requirements may be identified between, and within, those two 
categories. 

Clearly, there is a choice between designing a number of special- 
purpose AUs and designing a single universal AU. Further choices 
concern sharing of, and multitasking by, access units. Should AUs be 
placed in a common pool and shared by a larger group of stations? 
That would entail further switching outside the main switch to mediate 
connections between AUs and stations. Should an AU be multitasked, 
serving simultaneously different stations? That would make the AU a 
more complex device. Figure 2 illustrates a switch that incorporates 
both sharing and multitasking. 

Our inclination is towards universal AUs, one to each station, and 
towards neither sharing nor multitasking. True, this calls for the 
largest number of AUs, and not the least complex, at that. But it has 
the advantage of uniformity and, in the light of technology trends, of 
likely overall economy. 

The next choice concerns the serial memories. They may be active, 
made in semiconductor, or also, reverting to earlier technologies, 
passive, e.g., acoustic or electromagnetic delay lines. Passive compo- 
nents are attractive because they promise more reliability. However, 
our purpose would be better served by clocked shift register memories 
in an arrangement as, say, shown in Fig. 3. This makes for easier 
synchronization and permits bit-paralleling to hold down clock rates. 
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CENTRAL 
CONTROLLER 





Fig. 2—Different options in AU tasking. All AUs could be of one type, each serving 
a single station (right); AUs could be shared by a larger group of stations requiring 
selector switching outside the main switch (center); or an AU could be multitasked, 
serving more than one type of station (left). 


Reliability is a matter of overall design and implementation. In Section 
IV we discuss an architecture-related aspect of reliability, namely 
isolation of faults to limited sections. 

Finally, we have the question of overall network topology. Three 
different arrangements are shown in Fig. 4. Figure 4a shows the 
traditional topology of a central switch and star network. In Fig. 4b a 
completely distributed arrangement is shown in which the serial 
memories wend their way past every station. This would make it 
similar to a local area ring network and would be possible only with 
passive lines as the memories. A compromise between the above two, 
and an interesting topology for a PBX that has to serve an extended 
area, is shown in Fig. 4c. The serial memories are cut into sections, 
and each section is placed close to the group of stations that it serves. 
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Fig. 3—Shift register realization of serial memories. Access points are at the inputs 
of clocked unit delays; the writing is through OR gates. 


The lines connecting the individual sections and the central controller 
could be optic fibers, which would carry the total information streams 
serially even in a large switch. 


3.2 Protocols 


In the context of our proposal, a packet used in packet-switched 
communication is made up of five control fields and data, as shown in 
Fig. 5. The same format could be used in circuit-switched packets. But 
for these, at least one of the two address fields is unnecessary. Its 
space may either be added to the data field, or it could be used as a 
separate channel, a companion to the main channel. 

The six fields marked in Fig. 5 are: 

. BUSY—a single bit to indicate slot occupancy 

. RQST—a single bit, common channel used for slot pleading 

. SNDR—address or password of AU sending packet 

. RCVR—address or password of AU intended to receive packet 
. DATA—data field 

. SYNC—synchronization field. 

The roles of all the fields, except RQST and SYNC, are self-evident. 
RQST is used by packet-switching AUs and we will see its function 
presently when we discuss data communications. The SYNC field is 
written by the central controller to ensure slot and frame synchroniz- 
ation. Although both synchronizations could be achieved with just one 
bit per slot, a field of two bits will make them more secure. Altogether, 
the following numbers would be of the right order: BUSY and RQST 
one bit each, SYNC two bits, the addresses 14 bits each, and DATA 
192 bits, for a total packet of 224 bits, or 28 bytes. 

Corresponding to a packet, one may think of a time slot and of a 
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oe EF 
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Fig. 4—Network topology options: (a) central switch, stations connected by lines; (b) 
switch completely distributed with AUs at individual stations and connected by the 
serial memories realized as buses; (c) switch distributed in clusters, the seral memories 
within clusters realized by shift registers and between clusters by transmission lines. 


propagated memory block as consisting of the same number of bits 
and divided among respective fields. Note, however, that it is not 
necessary that the different parts of a packet be placed into a single 
slot or block. There may be interleaving of packet parts to any extent 
that is desirable. 

Thus, it is conceivable that in order to alleviate the pressure of time 
for the signaling from receiver to transmitter within the AU, the 


ROST | SNDR | RCVR DATA 


BUSY — SLOT BUSY FIELD 

RQST — SLOT REQUEST FIELD 
SNDR — SENDER ADDRESS 

RCVR — RECEIVER ADDRESS 
DATA — DATA FIELD 

SYNC — SYNCHRONIZATION FIELD 


Fig. 5—Slot format. Typically, BUSY, RQST and SYNC would be one-bit fields, the 
address fields could be two bytes each and the data field 24 bytes. 
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BUSY field could refer to the state of occupancy—not of the slot that 
it is riding in, but of the following slot. Similarly, other fields could be 
advanced or retarded, and not necessarily by only one slot, nor, indeed, 
just as a complete field. Thus, the DATA field could be broken into 
single bytes or even bits, and the fragments made to follow the header 
as an arbitrarily dispersed tail, provided only that all packets are 
fragmented and dispersed identically. 

The extreme fragmentation of packets, as just alluded to, may seem 
an attractive way of restoring smoothness to data flow for circuit- 
switched communications. Indeed, almost complete smoothness is 
possible for any one chosen rate. But it would be at the expense of 
considerable complication for all other communications, particularly 
the packet-switched and the circuit-switched that have higher rates 
than the one singled out for favorable treatment. We will dismiss it 
from further consideration and turn to describing procedures. 


3.2.1 Data communication 


Assume that AU addresses are in numerical order along the two 
memories, ascending in the direction of propagation along one and 
descending along the other. We will call the memory with ascending 
addresses the forward channel, and hence the other the reverse chan- 
nel. 

Suppose that an AU has to communicate to another AU of higher 
address. It must send a message, or packet(s), on the forward channel. 
To do so, the dispatch processor of the AU will follow the data dispatch 
routine of Fig. 6. This can be understood more easily with the help of 
the state diagram of Fig. 7. For the sake of description, this diagram 
relates to an exclusive forward channel dispatcher, although in practice 
a single dispatcher would service both directions. 

When idle, the dispatcher is normally in the “Go” state and monitors 
the sending buffer (for the forward channel), checking whether it 
contains a packet for transmission. If it does, it reads the BUSY field 
of the next block on the forward channel and at the same time writes 
a “ONE” in that field so as to seize the slot, should it be available. If 
it is not, i.e, BUSY was already “ONE,” then it will write “ONE” in 
the next RQST field on the reverse channel and wait for the next 
BUSY field on the forward channel. It will repeat reading and writing 
of BUSY on the forward channel and sending RQSTs on the reverse 
channel until a “ZERO” BUSY occurs. It will then write in the related 
SNDR, RCVR, and DATA fields, so dispatching a packet. 

Having sent a packet, the dispatcher moves to the “One packet sent’ 
state. If the sending buffer has at that moment one or more further 
packets for dispatch, then the dispatcher will behave exactly as in the 
“Go” state and send off the next packet, thereby moving to the “Two 
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BUSY 
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CHANNEL 





YES 


GET ROST 
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Fig. 6—Flowchart of forward channel data dispatch routine. 


packets sent” state. But if there is no packet in the sending buffer on 
entry to the “One packet sent” state, then the dispatcher will proceed 
to the “Halt” state. It will remain there until the next “ZERO” is 
written in the RQST fields on the reverse channel, whereupon it will 
revert to the “Go” state. Similar conditions apply on entry to the “Two 
packets sent” and further states, until the dispatcher has sent in a 
contiguous sequence M packets and entered the “M packets sent” 
state. From this it must proceed unconditionally to “Halt.” 
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REVERSE CHANNEL 


PACKET TO SEND 


NONE TO 
SEND 


PACKET TO SEND 


PACKET TO SEND 


NONE TO 
SEND 


PACKET TO SEND 


Fig. 7—State diagram of data dispatcher. The dispatcher goes temporarily into HALT 
state whenever it has no more packets to send or has already sent M packets since the 
ae oe It goes from HALT to GO as soon as the RQST bit on the reverse channel 
is : 


M is a parameter that may vary with AU. It represents priority 
standing: The larger its value, the less sensitive the AU is to pleadings 
for slots by other AUs that are downstream from it. It is normally set 
in relation to the rate of the station that the AU serves. 

The task of receiving is less involved but no less time consuming, 
and an AU will have a separate processor for it. A routine that it could 
follow is given in Fig. 8. This is set out on the assumption that the 
SNDR and RCVR fields of a packet would precede the DATA field by 
one slot. 


3.2.2 Real-time signal transmission 


An AU serving a real-time device has to act in two distinct modes, 
one in setting up or tearing down a circuit and the other in transmitting 
and receiving the real-time signals when the circuit is set up. We 
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GET SNDR, RCVR DATA 
DATA —*RECEIVING 
BUFFER 


Fig. 8—Flowchart of data receiver routine. 










outline the procedure, limiting our attention to telephony. Other 
devices requiring circuit connections would be served similarly. 

Looked at from the telphone, the AU would appear as the line 
selector of the standard switch. When the telephone is taken off hook, 
the AU would supply dial tone. As the number is dialed, it would be 
stored by the AU, which, on completion, would assemble a packet for 
transmission to the CC. The DATA field of that packet would disclose 
the fact that a telephone link is being sought, and the numbers of the 
calling and called stations. The sending procedure for the packet could 
follow the routine of Fig. 6, even though a simpler routine is possible 
since no “Halt” state is necessary. 

The CC would process received requests using a routine that could 
be as in Fig. 9. First, the CC would check the total switch capacity 
already committed to circuit traffic, and from this it would decide 
whether the setting up of the further circuit is permitted. If it is not 
permitted, then the CC would inform the originating AU, and that 
would terminate the processing. If setting up the circuit is permitted, 
then the CC would determine which AU serves the called station and 
check whether it is engaged. If it is engaged, then the CC would inform 
the originating AU accordingly. If it is not engaged, then the CC would 
tag both AUs as engaged and send messages to both AUs and inform 
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Fig. 9—Flowchart of circuit setup routine in central controller. 


them of each other’s addresses. The two addresses would also be 
inserted at appropriate places in ring buffers to cause the necessary 
premarking of slots by writing of BUSY and SNDR on the correct 
channels at the right frequencies. This would complete the setting up 
of the two-way circuit. Given a setup circuit, the dispatch and reception 
of the real-time data would follow the routines of Fig. 10. 

Note that only the SNDR address is used in circuit-switched trans- 
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Fig. 10—Flowchart of (a) real-time signal dispatch, and (b) real-time signal reception. 


missions: The receiver is given the sender’s address and recognizes it 
for the duration of the call. Apart from saving one address field for 
other use, there is a further bonus in that more than one receiver can 
be given the same SNDR address and simultaneously receive the same 
real-time signal. This leads to the possibility of a simple arrangement 
for broadcasting to designated outlets for, say, a public address system. 
If, furthermore, the AU’s receiver capability was enlarged to noting 
several SNDR addresses and taking in packets with those markings, 
then a telephone conference facility, with voice signal summation at 
each receiver, would be possible. 

The setting up, tearing down, and maintaining of calls to subscribers 
outside the switch’s own area would have to interwork with equipment 
in other offices. But there is no particular problem about this. The 
AU serving a trunk would interface with the outside system, sending 
and responding to signals in conformity with existing specifications. 
But, in other respects, it would not be different from an AU serving a 
local subscriber. Data out of, and into, the local switch area could also 
be carried by circuit-switched trunks, with suitable interfacing to a 
wider-area data network. The role of an AU providing that interfacing 
would then amount to that of a gateway processor. 


3.3 Packet size and clock rates 


The bit rate required along the SMs is related to the total peak load 
for which the system is designed, multiplied by a factor that accounts 
for efficiency. Assuming a telephone voice signal sampled at 8 kHz, 
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represented by 8 bits per sample and an allowed delay due to packe- 
tization of 3 ms, a packet may contain 24 samples or 192 bits of data. 
The overheads are mainly in the addresses: Assuming a 10,000-voice- 
line switch and a total number of AUs not exceeding 16K, the SNDR 
and RCVR fields could be 14 bits each. As we already noted, BUSY 
and RQST need be only 1 bit each, and SYNC 2 bits. The total 
overhead will then be 32 bits and the packet length will be 224 bits. 

Another criterion by which the overall size of a packet can be 
decided is efficiency. Since the allowable delay for voice is binding, 
the best size indicated for maximum efficiency will be of interest only 
if it is smaller than that already decided. 

It is a reasonable simplification to suppose that all offered traffic 
divides into two categories: very short bursts, and prolonged streams. 
Furthermore, it is reasonable to assume that the number of packets- 
per-second from the very short burst will be independent of packet 
size. Thus, such very short bursts would be produced by single ASCII 
characters from, and echoed to, computer terminals, when carried in 
individual packets. On the other hand, circuit-switched traffic and 
data file transfers are examples of streamed flow. 

Consider the total bit rate, R, that results from traffic consisting of 
b, short bursts per second, and an aggregated stream flow of S bits per 
second. If the packet has h bits of header and x bits of DATA, then 


R = [b(h + x)] + [CS/x)(h + x)]. (3) 
This will be a minimum when 
x= v(S-h)/b. (4) 


In a system serving a business, one might provide for a busy-hour 
voice traffic of 10 ccs (hundred call seconds) per telephone. In the 
switch, this will divide equally between the two memories. With 10,000 
telephones, the aggregate stream S, on each SM due to voice would 
then be 


S, = 10,000-5-64,000/36 = 89 Mb/s. 


A reasonable assumption for the present is that all other traffic would 
amount to 20 percent of the total, or in our example it would be a 
further 22 Mb/s. 

For the sake of illustration, assume that the very short burst rate, 
b, is 20,000 packets per second. If each of these carries only one 8-bit 
byte, then the net traffic from them is 160 kb/s, a negligible amount 
within the assumed 22 M/bs. But the gross traffic may be much larger, 
depending on packet size. Hence, the decision for best size of DATA 
field, which, with the numbers already invoked, follows from eq. (4): 


x = v(111,000,000-32)/20,000 = 421 bits. 
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For xp to be less than 192 bits, decided by delay considerations, the 
very short burst traffic would have to be 4.8 times larger than was 
assumed. But the assumed rate is already large, and therefore it is 
unlikely that efficiency considerations would indicate a smaller packet 
than given from delay. 

Given packets of 224 bits and the numbers cited above, the rate, R, 
in each memory follows from eq. (3) as 134 Mb/s. If 8-bit bytes are 
propagated in parallel, then the required clock rate is 16.75 MHz, a 
none too demanding frequency for present technology. The packet 
rate, which is of greater relevance to AU and CC speeds, would be 598 
kHz. 

A switch would be designed for a given ultimate size and given an 
appropriate clock rate from the start. But it would not be necessary 
to give it immediately the full complement of AUs, nor, indeed, full 
lengths of memories. AUs could be added without any disruption and 
memory sections with only a minor pause. 


IV. RELIABILITY 


Availability of communications services is extremely important and 
has prompted switch designers to adopt the very highest standards of 
reliability.*'° Thus, it is accepted practice to have two identical central 
controllers, one being a “hot” standby that can take over at any 
instant. This and other common practices would also apply to our 
switch. The features by which our switch is rendered most vulnerable 
in respect to reliability are its serial memories, which carry all messages 
and are accessed by all AUs. Below we consider the general question 
of disruptive impact by failures and suggest a measure for it. Then we 
introduce a fault detection and isolation scheme that would make the 
robustness of switching by serial memories with multiple read-and- 
write taps comparable to that of much more redundant architectures. 


4.1 Failure impact 


In switching equipment, including ours, failures are unequal in 
likelihood and in disruptive consequence. We introduce the notion of 
expected failure impact. Let 7; be the probability that component C; 
will fail during the course of one year; let the expected repair time for 
it be 7;,; and the number of potential communication connections that 
are unavailable while C; is in the failed state be v;,. We define U;, the 
expected per annum failure impact (EPAFI) of C;, as 


U; = » Wik? Tik* Vik- (5) 
k 


We assume that failures are statistically independent and disregard 
the probability of another component failing during the repair time of 
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an existing failure. EPAFI values then are additive, and U, with respect 
to an assembly of N components, is 


U _ > U;. (6) 
i=l 
We consider the expected failure of the SMs and all the AUs connected 
to them. Suppose that there are altogether N AUs, each with (1) a per 
annum rate 7; of failing in a way that affects only one subscriber and 
takes time 7, to repair, and (2) a rate 72, which disrupts communica- 
tions on the memory past the failed AU and takes 7 to repair. Also, 
let the memories have a rate 73 of failing at each of the 2N connecting 
points, with expected repair times 73. 
- With N mutually communicating AUs in the system, the number of 
potential two-way communication links is N(N — 1)/2. If an AU 
failure of the first kind occurs, then vy; = (N — 1) of these are disrupted. 
When the failure of the AU is of the second kind, or when a memory 
fails, then the number of disrupted links is much larger and depends 
on the actual location of the failure. One can calculate an average 
number on the assumption that all potential failure locations are 
equally likely. It is found to be approximately (N”)/3. Hence the total 
expected impact due to failures of AUs and memory links is 


U = [N-(N — 1)-mi] + [(N/3)-N?- 2] + [2-(N/3)-N?-u3], (7) 


where 7;v; has been contracted to p;. 


4.2 Failure detection and isolation 


We propose to divide the memories into sections and have a fault 
detector at the end of each section. Further, each section would have 
a bypass and, in case of a detected failure, a switch would be actuated 
to pass on to the next section the data stream at the output of the 
bypass (Fig. 11). Thus, effectively, the consequence of the fault is 
isolated to one section. A possible realization of the switch is shown 
in Fig. 12. 

A Fault Detector (FD) would compare the data streams at the 
outputs of the memory section and the bypass, and it would decide 
that failure has occurred when the evident modification to the stream 
in passage through the memory section violates existing constraints. 
The particular constraint of the several that exist in our case and 
which we use is the following: There may never be a change of any 
field that is already nonzero. Detecting any such illegal changes will 
catch failures both in AUs and the memories. The detection will, of 
course, rely on the output from the bypass being a flawless replica of 
what entered that section. If necessary, redundancies and error control 
could be implemented on the bypasses to make that more sure. 
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Fig. 11—Fault isolation using bypasses. Whenever the Fault Detector (FD) detects 
constraint violations, it switches input to next section to output of the bypass from the 
previous section. 


With the memories divided into sections and with fault detection 
and isolation in place, failure impacts will be reduced. Unless the fault 
is in the fault detector itself, then if each memory is divided into m 
sections, only the N/m stations within a section will be affected by a 
fault. The disruption depends on which section and which point within 
the section is involved. Again assuming equal likelihoods for locations, 
we can derive the average number of potential connections that are 
disrupted and find this, to a good approximation, amounting to 0.75 
(N2/m). 

Suppose that fault detectors have their own failure rate 7,4 for 
failures that produce an open line and z; for switching off a section 
when it should not be. Further, suppose that these have expected 
repair times 74 and 75, OF ws = 7474 and ws = 7575. On average, these 
events disrupt, respectively, N?/3 and N?/2m potential connections. 

The total EPAFI value, with division into m sections and fault 
detection and isolation, is then 


Um = [N-(N — 1)-py] + [0.75-N?-po/m] + [1.5 
.N®3.13/m] + [2-m-N?-y4/3] + [N?-us]. (8) 


The first and last terms in eq. (8) are much smaller than the others 
and may be neglected. The value of m that results in minimum U,, is 


BYPASS 





Fig. 12—Realization of two-way switch for fault isolation. FD outputs F and F are 
complementary, signifying FAULT and NO FAULT. 
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m = (3/2)-VN-(u2 + 2+p3)/2+ pa, (9) 


and the failure impact, with the first and last terms in eq. (8) neglected, 
comes to 


Uopt = (4/3)-N°?. Vg (u2 + 2-ms)/2. (10) 


This should be compared with the expected failure impact without 
sectional detection and isolation, as found in eq. (7). The improvement 
ratio is 


U/Uopt = (1/8)- VN (ue + 2-p3)/2- pa. (11) 


Further improvement is possible by instituting super sections by which 
a number of consecutive sections would be bypassed and again fault- 
tested and isolated, as shown in Fig. 13. Indeed, one can take the 
hierarchy of protection to any number of levels. 

With just one level of protection and, say, N = 10,000, and the 
different failure rates and repair times comparable to each other, the 
optimum number of sections would be around 185, or 55 AUs to one 
section. The improvement over no protection would be by a factor of 
40. A second level of protection would increase the improvement by a 
further factor of around 5. Asymptotically, as the hierarchy of protec- 
tion is taken to higher levels, the functional dependence of EPAFI on 
N becomes quadratic, which is the relationship that applies when the 
effect of a failure is confined to a single AU. 

V. CONCLUSIONS 


We have proposed a switch architecture that supports circuit- and 
packet-switched communications. Both kinds of communications can 
proceed at widely varying rates: Circuits can be set up with different 
capacities, selectable as a binary fraction or multiple of a basic capac- 
ity, while packet-switched communications share in the pool of the 
total switch capacity that is not in use at any given time. Thus, the 
proposed switch could cater efficiently in mediating real-time signals 
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Fig. 13—Fault detection and isolation in super sections. 
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and data. Specifically, it could be a PBX that, apart from voice, could 
provide other circuit- and packet-switched services. 

The possibility for the two modes is brought about by having 
enlarged time slots that can include addressing information, and then 
by making these of fixed length so that they can be made available 
regularly. Further, the switching is performed by access units that 
write and read on serial memories on which synchronization can be 
maintained without interruptions and information transfers can occur 
without collisions. 

We have proposed that data packets be fixed at 192 bits, or 24 
samples of pulse-code-modulated voice. This limits the delay due to 
packetization to 3 ms. The total delay, which includes propagation 
along the memories, will then be less than 4 ms, even in a very large 
switch. 

It is recognized that a switch used in telephony should conform to 
very high standards of reliability. We have proposed a scheme of fault 
detection and isolation applicable to memories as in our switch. This 
would substantially overcome any added vulnerability due to the serial 
nature of the signal paths. However, other issues (e.g., overall system 
reliability) not addressed by us remain to be resolved. In summary, 
our proposed switch offers the possibility of integrating voice and data 
in a way that would preserve the quality and reliability of voice 
communications and therefore, in turn, could be integrated with the 
telephone system at large. We believe that, provided no compromises 
need to be made, very real benefits flow from having all communica- 
tions mediated by a common facility. It is possible that our proposed 
switch could meet such objectives. 
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In most time-shared computer systems a program is processed by the central 
processing unit for, at most, a fixed period of time called a time slice, or 
quantum. If the program requires more processing after it has received its 
quantum, it is placed at the end of a run queue. This procedure is repeated 
until the program has finished executing. To the user who submitted the 
program the two most important performance measures of such a system are 
the mean and variance of the program’s total elapsed time of execution. This 
total elapsed time is often referred to as the “response time”. In this paper we 
investigate the effect of the quantum size on the mean and variance of the 
response time. 


- INTRODUCTION 


The round-robin queue has been studied by several authors as a 
model of time-shared computer systems. In a time-shared system, the 
arrivals of requests for service as well as the service times may be 
thought of as random variables. From the user’s point of view, the two 
most important measures of performance in such a system are the 
mean and variance of the response time. The round-robin discipline 
implicitly favors jobs with shorter service times, in the sense that the 
mean response time is approximately a linear function of the service 
time.’ Thus far, however, the variance has proved to be intractable in 
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the case of a general service-time distribution. In the case of exponen- 
tial service times, Muntz has found the Laplace transform of the 
waiting-time distribution.” 

The round-robin model can be described as follows: New arrivals 
join the end of the queue, and all jobs in the queue are served on a 
first-come first-served basis until they have completed their service 
requirement or have received one quantum of service. When a job has 
completed service, it leaves the system, and the next job in the queue 
begins service immediately. If a job requires more service after receiv- 
ing its quantum, it rejoins the queue. In this paper we assume that the 
arrival process is Poisson but the service times are governed by a 
general distribution. Overhead due to switching between jobs can be 
included by adjusting the service requirement. For simplicity of ex- 
position we assume that the quantum is constant. Variable quantum 
sizes also yield to our method of analysis. 

In this paper we address the following questions: 

1. What value of the quantum minimizes the mean sojourn time of 
a given class of jobs? 

2. For a given class of jobs what is the variance of the sojourn time 
and what quantum minimizes the variance in the sojourn time? 

Here “sojourn time” refers to the total amount of time that a job is 
in the queueing system, both in the queue and in service. To answer 
the second question, we use a new light traffic-heavy traffic interpo- 
lation (which we will call the RS interpolation) developed by M. 
Reiman and B. Simon, which makes use of a “light traffic derivative”. 

In Section II we describe exactly how one calculates the mean 
waiting time in a round-robin queue. In Section III the results on 
minimizing the response time and a simple method for finding the 
optimal quantum are presented. The RS interpolation is described in 
Section IV, and in Section V we present some numerical examples. 


Il. THE MEAN WAITING TIME 


The mean waiting time in a round-robin queue has been studied by 
several authors.*° By “waiting” time we mean the total amount of 
time that the job spends in the queue (but not in service). In this 
section we describe the authors’ analysis and set the notation that will 
be used throughout the paper. As mentioned above, we assume that 
the arrival process is Poisson with rate )\ and that the service times of 
_ newly arriving jobs are independent, identically distributed random 
variables with distribution function F and density f. Let gq denote the 
quantum size. We say that a job in the system is type j if its service 
time requirement as a newly arriving job is between (j — 1)q and jq, 


2 *y ee, 


(t — 1)q and ig units of service. 
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The following are additional notations: 
m is the mean service time. 
p is the traffic intensity, \m. 
p; is the probability a newly arriving job is type j. 

m, is the mean amount of time that a type (i, 7) job in the queue 
will occupy the server the next time it receives service. 

M;; is the second moment of the amount of time that a type (i, /) 
job in the queue will occupy the server the next time it receives 
service. 

Ry is the mean forward recurrence time of a type (i, j) job. 

Q;; is the mean number of type (i, 7) jobs in the queue. 

w; is the mean amount of time that a job must wait in the queue 
in the ith time in the queue. 

W is the mean amount of time that an arbitrary job must wait in 
the queue before it completes its total service-time requirement. 


2.1 Linear equations for the w; 


A derivation of the following equations is contained in Ref. 5. 


w= Y myQy + ApjmyRy, (1) 
j>1 
ore 
and for n = 2 
n= YY my Qy + ABjMyMy+i-1,; 
Pe eran 
n-1 
+ > (x \pwma) (Wn-i + q). (2) 
i=1 \kei 


Using Little’s Law, which takes the form Q, = \p;w; in this case, we 
can eliminate Q; from (1) and (2). One can easily verify that the 
matrix form of the equations for the a; is 


w = pMw + pb, (3) 


where w is the vector whose ith component is w;, and M and b are a 
matrix and a vector, respectively, that are independent of p. Finally, 
the mean waiting time, W, is given by 


W= > Di. (4) 
The following fact can be used to simplify the expression for b: 


bi = 2m" > pjMy, 
j2l1 


1<isj 
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and for n = 2 
bn = q. (5) 
2.2 Favoring a class of jobs 


Using the above results we can compute the mean waiting time W? 
of a particular class of jobs if we are given the service-time distribution 
F, of arrivals of that class: 


Wi = DL FG ~ Dadop 
= 

Figure 1 suggests that, at least in some fairly typical cases, the 
quantum size that minimizes the mean waiting time of a preferred 
class of jobs is neither very big [which is, in essence, First-In First- 
Out (FIFO)] nor very small (processor-sharing), but is just big enough 
to let all the preferred jobs complete service without having to feed 
back. 

In this example, F; is uniform between one and three, F, is uniform 
between four and eight, and F = 0.75 F, + 0.25 F». In addition, \ = 
0.1, so p = 0.3. Note that gq = 3 minimizes W! over all q, and in 
addition, the mean waiting time of the complementary class decreases 
with increasing q. 


2.3 A simple method for approximating the optimal quantum size 


Since the expression for W' is complicated, it is quite difficult to 
find the optimal quantum qo without significant computational effort. 
The following observation (from numerical experiments) yields a 
simple method for finding qo. 

Observation: The value of g that minimizes [dW'(0)]/(dp) is close 
to the value of g that minimizes W’ (for arbitrary values of p, 0 < p< 
1). 

An easy calculation using (3) implies that 

dW\(0)_ 


B= fb Pulls - Dalby. 
p j=l 





Using (5) we see that 


dW'(0) 


ae b+q % [1 - Fi(iq)]. 
p j=l 


Hl. THE RS INTERPOLATION. 
3.1 Description 


We describe the interpolation as it is applied to the steady-state 
waiting time distribution in the round-robin queue with Poisson 
arrivals. The interested reader is directed to Ref. 3 for a general 
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Fig. 1—Expected sojourn times as a function of the quantum for (a) type II jobs and 
(b) type I jobs. 


treatment of the method. Let W,(p), 0 < p <= 1, be the nth moment of 
the steady-state waiting time distribution as a function of p, the traffic 
intensity. Let W,,(p) = (1 — p)"W,(p) when 0 < p <1, and let W,,(1) 
= lim(1 — p)"W,,(p). W,,(1) is well-defined and finite by the argument 


in Ref. 6. 

Note that W,,(0) = W,,(0) is zero and W,,(1) can be calculated as the 
heavy traffic limit using a diffusion process. One can interpolate 
between light and heavy traffic to get the approximation formula: 


W(1)o + Wn(0)_ Walp 


Roser: (1—p)? — (1—p)?" 
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This idea is, of course, well known. The novelty of the RS interpo- 
lation is that it makes use of the derivative of W,,(p) at p = 0, which 
we will denote by W,(0). It is clear that W/,(0) = W/(0). The RS 


W. ( ) ( nl ) , n( 7 AA ) : 


3.2 The light traffic derivative 


Let q be a fixed quantum, and let o(x, a, b) be the total waiting time 
of a tagged job, J,, that requires a units of service given that in the 
entire history of the system exactly one other job, Jo, arrives x units 
of time after J; and requires 6 units of service. Here, a and b are 
nonnegative real numbers, and x is any real number with negative x, 
implying that J, arrived before J;. Figure 2 describes c. 

Let 


o,(a, b) = { a(x, a, b)"dx 


and let F; be the service-time distribution of J;, i = 1, 2. The following 
theorem allows us to calculate W/(0). 


Theorem 1: 


W,(0) = E(F2)™ iA f 6,(a, b)dFy(a)dF2(b). 


The proof of a more general version of this result is contained in 
Ref. 3. We now present a formula for W;,(0) using the above theorem. 
A straightforward calculation yields 


Gn(a, b) = {b — max(0, (Lb/q] — La/q))q}}"**/(n + 1) 
+ max(0, Lb/qJ — La/qJ)[(La/gJ + 1)"*? 
— La/qi”*"}q"""/(n + 1) 


min 


+ max(0, La/q) — Lb/qJ)qb" + gq’! Yk", 
k=0 


where min = min (La/qJ, Lb/qJ) and LxJ is the greatest integer less than 
x 
Hence, 


G+Da G+)¢ 
W,(0) = E(Fe)* if dn(a, b)dF,(a)dF2(b) 


ij=0 “Ja 


(j+1)¢ 
= E(F2)* 2 (Fi((@i + 1)q) — Fi(ig)) i Gn(t, b)dF2(b). 


ijz 
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And 


(j+1)q¢ 
{ Gn(i, b)dF2(b) = MFn+i{max(0, j — i)g]/(n + 1) 


+ M?,(0)max(0, i — j)q 


min(i,j) 


+ M3 lar » k” + max(0, j — i) 


[G + UB as —_ i" @"™*1/(n + »| 
Here M?,(x) = f¥*9 (b — x)"dF(b). 


3.3 The heavy traffic limit 


The queueing system under consideration in this paper is a special 
case of the multiclass feedback queue analyzed in Ref. 6. Here we 
follow the development in Ref. 7 for the readers’ convenience. 

Let U(t) denote the unfinished work process. Formally, 


U(t) = V(t) — inf {V(s)}, 


where V(t) = L(t) — t and L(t) is the total amount of work entering 
the system in [0, t]. (We assume the system is empty at t = 0.) Define 
a sequence of systems whose parameters, and queue-length and so- 
journ-time processes are indexed by n = 1, and consider the normalized 
processes 

A Un (nt) 

Uw (t) = —=—_,, n21 

(ny(t) es 

over some finite interval, which we normalize to [0, 1] for convenience. 
For a heavy traffic limit we assume \") — m7! as n > 0. We have the 
result of D. C. Igelhart and W. Whitt (1971). 


Theorem 2: If 
lim(p™ -— 1) Vn =¢, —070<¢c< 0, 


n—0 


then as n —> © 
U”™ = U = RBM[c, m“(s + m?)], 


where = denotes weak convergence. 

Here s is the variance in the service times, and a is the variance in 
the interarrival times. RBM(d, o”) is one-dimensional reflected 
Brownian motion with drift d and infinitesimal variance o’. 

Now consider the normalized queue-length process 
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ie (n) 
Ory = BED Ostslnz=1lj2zlilsisy. 
Vn 
We have 
J 
U™(t) = Y , Q(t) 2 Myj. 
jzl1 i=1 =L 
Here ~ means 


lim sup | Oe) -— ¥ y Q(t) y My | = 
kn 


n-ow 0<t<1 j21 i=1 


in probability. 
Theorem 3: Let 
Y(t) = sup | Ap-Qy(t) — Aj LPO) |, 
where the supremum is taken first over alll <i Sj and1 1’ sj’ and 


then over all j, j’ = 1. If \™ — m'|, then as n > », sup Yt) 
Ost<1 


converges in probability to zero. 
Using these results one can prove the following. 


Theorem 4: Under the conditions of Theorem 2, 
MP) => ApTU 
as n — %, where 
T= dp; y imy. 
j2i i=1 

Next we consider sojourn times. Again we use the normalization 
wl” (nt) 

ae 
For single-pass jobs (i.e., 1 = 1), 

oy (t) = 2 my Q(t). 


1<is) 





a(t) = Ost<l,niz21. 


So from Theorem 4 we see that 
a(t) = PUM (t) LY Apjmy. 


jz 
1sisj 
Thus, from p™ — 1 and Theorem 3 one can prove Theorem 5. 
Theorem 5: Under the conditions of Theorem 2, 
om” = iT1O 


asn— ©, 
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Let 6,(t) = TU(t) so that 
o,(t) = RBM[-I™, T7?(As + A7*)]. 


Since we are interested only in the stationary behavior of the 
queueing system, we observe that the stationary density of RBM 
(a, 8), (a <0) is 


2| a| 8 exp(2a6'x). 
So the kth moment of the stationary distribution of RBM(a, 8) is 
R(2| a] B*)* 
fork = 1,2, ---. 


Let w'(p), 0 < p <1 be the kth moment of the amount of time a 
job waits in the queue the ith time in the queue, and let 


w?(p)(1 — p)* O<p<l 


~R = 
OP) Sek age get: 
pl 


The above results yield Theorem 6. 
Theorem 6: For all 1, 


@F(1) = wi(1) = RI2T/(As + X74). 
Furthermore, W,,(1), is given by 
W,(1) = d3(1) > j*pj. 
j21 


V. SOME NUMERICAL EXAMPLES 


In this section, we present numerical examples of two sorts. Figures 
3 and 4 demonstrate the accuracy of the interpolation when applied 
to the mean sojourn time. Here we are comparing the interpolation 
with the exact formula given in Section II. In addition, these examples 
tell us that for some typical load situations, the quantum that mini- 
mizes the sojourn time for the type I jobs does not seriously degrade 
sojourn time for the type II jobs. 

Figures 5 through 10 present the approximation to the variance in 
the waiting time for some typical choices of service times using the 
RS interpolation to approximate the second moment W,.(p) of the 
waiting time. An interesting thing here is the similarity between the 
mean and variance regardless of the service-time distribution. Notice 
that the qualitative properties of the mean and variance of the sojourn 
time as a function of the quantum size are quite sensitive to the 
service-time distribution. In the case of exponential service times it 
appears that the waiting time gets smaller as the quantum gets smaller, 
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Fig. 3—Expected sojourn time for type I jobs. 
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Fig. 4—Expected sojourn time for type I jobs. 
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Fig. 5—Variance of waiting time for type I jobs. 
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Fig. 6—Variance of waiting time for type II jobs. 
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Fig. 7—Expected waiting time for type I jobs. 
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Fig. 8—Variance of waiting time for type I jobs. 
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Fig. 9—Expected waiting time for type I jobs. 
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Fig. 10—Variance of waiting time for type I jobs. 
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whereas in the deterministic and uniform cases it is best to allow all 
the favored jobs to finish in one quantum. 
The following notations are used in Figs. 3 through 10. 
rho is the overall traffic intensity of the example queue. 
p is the proportion of jobs that are type I. 
G1 is the service-time distribution of the type I jobs. 
G2 is the service-time distribution of the type II jobs. 
IeXP(X) means an exponential distribution with mean X. 
D means a deterministic distribution with mean X. 
U(X,Y) means a distribution that is uniform between X and Y. 
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An analysis of an integrated voice-data network with Demand Assignment 
Time Division Multiple Access (TDMA) is presented using the following 
model: (1) voice calls that cannot be serviced are blocked, whereas requests to 
transmit data messages are queued; (2) no traffic boundaries are assumed, i.e., 
any new traffic arrival may be assigned to any unassigned time slot; (3) 
message lengths are exponentially distributed with the mean voice message 
length assumed to be much larger than the mean data message length; (4) 
traffic requests are generated according to two independent Poisson processes; 
and (5) time slot assignments are made instantaneously and no priorities are 
assumed. Such a model applies to a single-channel TDMA network in which 
voice and data traffic arrivals are serviced on a first-come first-served basis. 
An approximate analysis, based upon physical insight, is presented that yields 
the blocking probability for voice messages, the mean number of queued data 
requests, and the mean value of the peaks of the data queue process. Compar- 
isons with simulation results indicate that the analytical results are very 
accurate. Performance curves are presented and compared with analogous 
results for TDMA networks that handle only one traffic type. 


I. INTRODUCTION 


The popularity of integrated voice-data networks has motivated 
numerous analyses of associated network queueing models.’’ This 
paper presents an analysis of a voice-data network using Demand 
Assignment Time Division Multiple Access (DA/TDMA). This work 
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evolved from a study of a multichannel DA/TDMA protocol that 
handles different traffic types, and that has received much attention 
in the context of satellite communications.®® These networks typically 
have a few hundred megabits of capacity and can be used to carry 
real-time digital voice traffic in addition to data traffic. Analysis in 
Ref. 10 indicates that a large multichannel TDMA network (i.e., a 
network with more than 100 communicating traffic nodes) behaves 
much like a single-channel TDMA network with an equivalent number 
of time slots per frame. We therefore concentrate on the simpler 
single-channel network and attempt to characterize its performance 
when used to handle both real-time voice and data traffic. The results 
in this paper carry over to the multichannel case when the number of 
traffic nodes in the network is large.’° 

A TDMA protocol divides the broadcast channel into a series of 
time slots of identical width. A prespecified number of time slots forms 
a TDMA frame that continually repeats itself. A demand assignment 
protocol assumes that when a traffic source has a message to transmit, 
it must first send a message to a central controller indicating that it 
wishes to transmit a message to a specified destination address. The 
central controller assigns specific time slots to each received request 
on a noninterfering basis. Only one time slot per frame is assigned to 
each request. Once a time slot is assigned, it remains assigned to the 
same traffic source for the duration of the message. We therefore 
assume that each data message consists of a variable number of 
packets, where the length of a packet is the number of bits per time 
slot. Voice calls are assigned a dedicated time slot for the duration of 
the call. (Full-duplex voice traffic actually requires two time slots per 
frame, one for each direction.) Figure 1 shows an example in which 
there are four time slots per frame. The numbers in each slot specify 
the source and destination addresses. The controller has assigned slot 
1 to the source-destination pair 1-2. Since the message generated by 
source 1 requires more than one time slot, source 1 also uses the first 
time slot in the succeeding frame. The number of time slots per frame 
and number of bits per time slot are design parameters that can vary 
from system to system. Notice, however, that an additional constraint 
might be that the number of frames per second and number of bits 


FRAME 1 FRAME 2 
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Fig. 1—Assignment of source-destination pairs to time slots. 
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per time slot must be selected so that each time slot represents a 64- 
kb/s circuit, which is necessary to provide toll-quality PCM voice 
transmission. 

As the source traffic intensity increases, so does the probability of 
not being able to assign a new voice or data request because all slots 
have already been assigned. New voice requests that cannot be as- 
signed are blocked, whereas unassigned data requests enter a queue. 
As slots become free, requests from the queue are then assigned. The 
netwerk model analyzed here is different from the models used in 
Refs. 1 through 7 in one or more of the following respects: (1) Each 
channel time slot can be assigned to either traffic type. In particular, 
no moving boundaries’®’ are assumed that partition the time slots in 
each frame into a section reserved for voice traffic and a section 
reserved for data traffic. (2) The duration of each message measured 
in number of time slots, or equivalently, the number of TDMA frames, 
is an exponentially distributed random variable. Furthermore, the 
mean voice message length is assumed to be at least an order of 
magnitude larger than the mean data message length. (3) No priorities 
are assumed, so that traffic is serviced on a first-come first-served 
basis. 

To simplify the analysis, approximations based upon physical in- 
sight are used. Results are expressions for voice blocking probability, 
mean number of queued data requests, and the mean value of the 
peaks of the data queue process as functions of the number of time 
slots and traffic parameters. Comparisons with simulation results 
indicate that these analytical results are quite accurate. 

The next section describes the network queueing model in detail. 
Section III presents the analytical results, and Section IV presents 
performance curves and compares them with analogous curves for 
systems having only one input traffic type. 


Il. NETWORK QUEUEING MODEL 


The TDMA network is modeled as a c-server queueing system, 
where c is the number of time slots per TDMA frame. We assume 
voice and data traffic requests to be generated according to two 
independent Poisson processes with respective arrival rates \, and dq. 
We therefore implicitly assume an infinite source model. Service times 
(or message lengths) in both cases are assumed to be exponentially 
distributed, with service rates yu, for voice messages and pa for data 
messages. In particular, the mean number of time slots required for a 
voice message is 1/y,. Notice that in practice the service distribution 
must be discrete, since messages can only last an integral number of 
time slots. The continuous distribution assumed here is a good ap- 
proximation to a “discrete” exponential distribution as long as the 
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typical message length is a relatively large number of time slots (i.e., 
> 10). 

Each voice or data arrival demands one time slot per frame for the 
duration of the message. The rates (i.e., bits per second) at which both 
voice and data messages access the channel are therefore identical. 
Voice messages that cannot be assigned a time slot immediately are 
blocked (i.e., disappear), whereas unassigned data requests enter a 
queue. To simplify the analysis, we assume that time slot assignments 
occur instantaneously, rather than at the beginning of the next frame. 
In particular, as soon as a request is generated, it is immediately 
assigned, assuming unassigned slots exist. This approximation is rea- 
sonable as long as the average message lengths are much greater than 
one time slot. Given that there are queued data requests, they are 
assigned as soon as time slots are relinquished by traffic already 
assigned. Voice messages that arrive while data requests are queued 
are therefore blocked. As either the voice or data traffic intensity 
increases, the data traffic tends to grab enough time slots to empty 
any queue that may appear, thereby depriving voice traffic of available 
time slots. One therefore expects that under high traffic intensities, 
voice blocking probability is quite high, whereas the mean data queue 
length (i.e., number of queued requests) is moderate. 

An exact analysis of the model just described can be performed by 
extracting the associated embedded Markov chain. In this case, the 
two-dimensional state defining the embedded Markov chain is (d, v), 
where d and vu represent the number of data and voice messages, 
respectively, in the system. Transition probabilities are easily deter- 
mined in terms of the traffic parameters and number of time slots, 
thus a solution for the steady-state probability distribution p(d, v) can 
be theoretically obtained. If we assume that the data message queue 
can be arbitrarily large, however, the number of states in the Markov 
chain becomes infinite. As the dimension of the problem increases, 
the amount of numerical computation required to obtain p(d, v) 
increases, which in turn causes further propagation of round-off errors. 
The exact analysis just outlined was attempted."! However, it was not 
successful due to finite word-length effects. The approximate analysis 
in the next section is therefore proposed as a simple alternative. 


IH. ANALYTICAL RESULTS 


We start by deriving an approximate expression for voice blocking 
probability. In steady state, the minimum number of time slots per 
frame needed to ensure that the system remains stable (i.e., the number 
of queued data requests does not become infinite with probability one) 
is 
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| +1, 
Ma 


where [x | denotes the largest integer less than x. If the number of 
slots is less than this amount, then the system would be unstable even 
with no additional voice traffic. If the number of slots is greater than 
this amount, then the system must be stable since unassigned voice 
requests disappear, and because voice requests cannot preempt queued 
data requests. All queued data requests must therefore be assigned 
before any voice messages can be assigned. 

The number of time slots per frame available for data messages is a 
random process that varies according to how many time slots are 
assigned to voice traffic. Assuming that the mean service time for 
voice messages (1/u,) is orders of magnitude greater than the mean 
service time for data messages (1/ua), the voice “state”, i.e., number 
of voice-occupied slots per frame, varies much more slowly than the 
data state, which is the total number of data messages present in the 
system. It is therefore a good approximation to assume that the time 
spent in each voice state is much longer than the time it takes the 
number of data requests present in the system to reach steady-state 
behavior. This “steady-state approximation” is the basis for the anal- 
ysis that follows. Using this approximation, it follows that if the 
number of voice messages in the system is greater than or equal to 


Uo =c— Be (1) 
Md 


the normalized data traffic intensity conditioned on the number of 
voice-occupied slots, Ag/[(c — Vo)]ua, is greater than one, and data 
requests become queued with probability one. Since the number of 
queued data requests is assumed to reach steady-state behavior, this 
queue cannot be emptied until a voice message relinquishes a time 
slot. The steady-state approximation therefore implies that the num- 
ber of voice messages in the system is never greater than U9. Computer 
simulations of the queueing model considered have verified that the 
probability of the voice state v becoming greater than Up is indeed very 
small when yp, is much less than yg. The “competition” of voice and 
data traffic for available time slots can therefore be expected to 
produce intermittent queue “spikes,” representing times at which the 
voice state v = Up. During this time, the data queue process experiences 
a “transient instability.” 

Given that v time slots are assigned to voice traffic, the probability 
that an incoming voice message is blocked is equal to the probability 
that the number of data requests in the system, d, is greater than or 
equal to c—v. Using the steady-state approximation, this is simply 
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the steady-state probability that a queue exists in an M/M/c—v 
queueing system and is given by” 


ne mee 2) cee 
Rey) 5 (*) fue): @) 


c—v—-1 1 eo k 1 = c—v 1 -1 
p= [E + (%) Fe auNt (*) ain) = te) 


The blocking probability for voice traffic is therefore 


where 





Pa= ¥ pld = ¢- v|v)po), (4) 


where p(v) is the probability that v time slots are assigned to voice 
traffic. 

Consider now a blocking system with up servers and one Poisson 
input. Given v < Uo, let ¢, denote the probability that a new arrival 
can be served (i.e., even though all servers are not busy, a newly 
arriving request is blocked with probability 1 — ¢,). Assuming expo- 
nential service times, the probability that v servers are busy is known 


to be’? 
1 dy v y-l1 | 
v! (*) i 0 
po) = (5) 


» Li (2) Hol 
i=0 L&- \Hv/ j=0 


This exactly describes the voice-data system considered, where the 
“entrance” probability, 


dy = pid =c—v\|v), (6) 


and is given by (2). Substituting (2) and (5) into (4) therefore gives 
the desired result. Notice that because the arrival processes are Pois- 
son, the blocking probability Ps is equal to the probability of being in 
a blocking state (i.e., all time slots are busy), which is equal to the 
probability that data requests are queued. 

An analogous argument can be applied to compute the mean number 
of queued data requests. Denoting this queue length as g, we have 


E(q) = ¥, plv)E(alv), @ 


where E(q|v) is the mean number of queued data requests given v 
assigned voice messages. Assuming py < pg implies that E(q|v) is 
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approximately equal to the mean queue length for an M/M/c —v 
queueing system,’” i.e., 


Po_(d\ __(¢ — v) Av _ 
= —— | — SU EEE? <v< 
E(q|v) =~ (%) eva ue? SU <4 (8) 


where Po is given by (3). If v = Uo, this expression no longer applies, 
however, since the system becomes unstable. To approximate E(q| vo), 
note that when v = Uo, the mean data queue length increases approx- 
imately at rate \g — (c — Uo)ug until a voice message relinquishes its 
time slot. At this point the queue starts to empty at rate (c — Up + 1) ug 
— qg. As the queue empties, more time slots may be relinquished by 
voice messages, causing the queue to empty at a faster rate. Suppose 
that we assume 


E(q| vo) al Efq(t)|dt, (9) 


where to is the duration of the queue spike and q(t), 0 St S to, is the 
queue length as a function of time (given that v increased from Up — 1 
to Up at t = 0). Furthermore, we assume that E[q(t)] is piecewise linear 
(fluid flow approximation).* Then it is shown in Appendix A that 

1 Ag = Ba di, 


E(q| v0) ¥ = 4= $a 
By \2 Wen | Weve de # Daa al 


[Gj-1 he ai} ’ (10) 





to 


2 2(Uo — J) be 





where 
gi to 1 
to = -—— ht —— ts 11 
: (C — Uo + Io + 1)ua — Aa 2 (Vo — J) Mo a 
. Ag Ma : ; 1 
i= = C — Uo + Ne 12 
? Vou 2 K res a] (Vo — J) bo ) 
and 
lo = max{i|q; > 0}. (13) 


Substituting (5), (8), and (10) into (7) therefore gives the approximate 
mean queue length. 

The expressions for voice blocking probability and mean data queue 
length presented thus far have been found to be quite accurate when 
compared with simulation results. To gain further insight into the 
behavior of the system, however, we now attempt to characterize the 
transient instabilities, or queue spikes, which occur when v = Uo. 
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3.1 Analysis of transient instabilities 


Figure 2 illustrates the problem under consideration. The queue 
spikes appear whenever the voice state is equal to ug. Not shown are - 
queues that occur when the voice state v is less than uo. The height of 
each spike is denoted as dy, the duration of each spike is denoted as 
Tw, and the time between spikes is denoted as T’. Figure 2 is not meant 
to indicate a typical sample function for the data message queue 
process. Significant queues appear when v < Up; however, the purpose 
of the following analysis is to determine whether the transient insta- 
bilities shown in Fig. 2 cause serious performance degradation. 

We begin by computing the distribution of the peak value of each 
spike. Let p(d,t) denote the probability that d data messages are 
queued at time t, given that v = vp. At t = 0 we assume d = 0. The 
following equations can be derived in a straightforward manner,” 


dus P PF 
i Pd #) = Aapld — 1, t) — [le — vo)ma + Aalp(a t) 
+ (c — Uo)map(d +1, t) for d>O (14a) 


and 


< p0, t) = —)ap(0, t) + (c — Uo) Map(1, t). (14b) 


Solving (14) gives the probability that the maximum queue length is 
equal to d given the time until the first voice departure is t. (Recall 
that as soon as v decreases from Up to Up — 1, the mean queue length 
decreases.) We know, however, that the time until the first voice 
departure is exponentially distributed with parameter upp,, so that 


qu(dm) = Pr{maximum queue length = dy} 
= i Vote“ p(dm, t)dt 
0 
Vo Hv Qu (Voto; du), (15) 


qit) 





TIME ——— 


Fig. 2—Transient instabilities in the data queue process. 
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where 


Qu(s, du) = 7 e “p(dm, t)dt (16) 


is the Laplace transform of p(dy,t). Equations (14a) and (14b) 
are standard “birth-death” equations.’2 The Laplace transform, 
Qu(s, d), can be computed directly from (14) and is given by 


rdut ( s) 


Qu(s, du) = Coane’ (17) 
where 
1 
ri(s) = EER {s + Ag + (¢ — vo) ua 
+ v[s + Xa + (¢ — Vo)pal® — 4(¢ — Vo)uada} (18a) 
and 
re(s) = evi: {s + a + (c — Vo)Ha 


— vfs + da + (¢ = vo)ma}’ — 4(¢ — vo)mada}. (18b) 
We therefore have 


qu(dmu) = gu(0)r™(voun), (19) 

where 

Volv 

0) = -————_—__————.. 20 
qu4(0) (¢ — Vo)malT1 (Von) — 1] mm) 

Since 
~ au(du) = 1, (21) 

dy=0 
it follows that 

qmu(0) = 1 — re(vopy)-. (22) 


The distribution of the maximum of the queue spike is therefore 
geometric with parameter re(Upu,). Consequently, 


as r2(Vopy) 
PACH) 1 — re(vopy) (23) 
and 


[1 = re(vowy) P’ 


Notice that as py, decreases relative to ua, E(dy) increases. 
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Fig. 3—Sample path for queue process. Every tenth sample is shown. 


A sample path of the number of queued data requests versus time is 
shown in Fig. 3. (Every 10th sample is plotted.) Queues periodically 
build and empty, which suggests that as an approximation, the mean 
value of the peaks of these buildups is given by (23). This approxi- 
mation becomes more accurate as yp, decreases relative to pg. Note, 
however, that (23) gives the mean value of the peaks of the queue 
process with initial conditions v = vp and d = 0. It is possible that the 
mean value of the peaks of the queue process, assuming v < Ug, is 
larger than that predicted by (23). [In some cases, the conditional 
mean queue length given by (8) for v = Up — 1 is in fact larger than 
E(dy).] A better approximation to the mean value of the peaks of the 
queue process can be obtained by computing the conditional means 
assuming v = 0, 1, --+ , Uo, and then using the distribution p(v) given 
by (5) to form a weighted average. As the present analysis is concerned 
with evaluating the performance degradation caused by transient 
instabilites, this computation was not performed. 

We now compute the mean duration of the queue spike. Denoting 
this quantity as 7,,, it is apparent that 


Tis = Tait SF Tw,25 (25) 


where 7,,; is the mean time it takes the queue to reach its maximum 
value given v = Uo, and 7». is the mean time it takes the queue to 
empty. From the previous discussion, 


1 
Volvo ; 





(26) 


Tw,1 = 
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Let 77, denote the mean time it takes to reach state d = 0 (i.e., no 
data message queue) given an initial state of d queued data messages 
and v voice-occupied time slots. We have the following transition 
equation, 


= 1 2 = 2 
Tap = ay a PoTd-1,v a QvT d+1,v + WyTdv-1 (27a) 
a(v) 


with initial condition 
Tow = 0, (27b) 
where 
o(v) = vpy + (c — Vv) ua + Aa (28) 


and 1/[c(v)] is the mean amount of time spent in state (d, v) before a 
state transition occurs, and 


_ (¢ — v)pa 
_ Ae 
Qu = ae)’ (29b) 
and 
_ Ub 
Wy = a) @) (29c) 


are, respectively, the probabilities of going to states (d — 1, v), (d + 
1, v), and (d, v — 1) from state (d, v). Equation (27) is a two-dimen- 
sional difference equation that is nonlinear in v. Notice, however, that 
we desire 


qm (d Traut 


Ms 


Tw,2 = 


Qu 
ll 


0 


ao 


[1 a ro(Vopy)] z r$(Voty)Fau-1 


d=0 
= [1 — re(vo“y) JD en Vo — i), (30) 
T2(Vopy) 
where 
D(z, v) = 3 x 4Fay (31) 
d=0 


is the partial z-transform of 73,. An iterative method for computing 
D(1/[re(vony)], Vo — 1) is discussed in Appendix B. 

To compute the mean time between unstable periods, we first define 
T., as the expected value of the first passage time it takes to go from 
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an initial voice state v < Up to voice state Up. The mean time between 
unstable periods is then 


Up—1 


T= ¥ p*(v)T, (32) 
v=0 

where p*(v) is the probability that the voice state is v at the end of a 
queue spike (i.e., when d returns to zero). The computation of p*(v) is 
similar to the computation of 7,,. In particular, let p(v|v1, d) denote 
the probability of v voice-occupied time slots at the end of an unstable 
period given an initial state (v,, d). The following transition equation 
is easily obtained, 


p(v|u, d) = qp(v|ui, d + 1) 
+ pyp(v| ui, d — 1) + wp(v|v1—1,d), (33a) 


where q,, Dy, and w, are defined by (29). The initial conditions are 


p(vjv — 1, d) = 0 (33b) 
and 
_Jj1 ifv=v, 
plv| v1, 0) = i‘ otherwise. (38¢) 


In analogy with (27), (33) is a two-dimensional difference equation 
that is nonlinear in v. As before, we desire 


p*(v) = [1 — re(vo)] y r§(Von»)p(v| vo — 1, d) 


d=0 


= [1 — re(vo)]D » VU] Vo — i), (34) 


T2(Vopy) 


where 


D(z, vj) = Y 2@p(v|u, d). (35) 
d=0 
An iterative method for computing D(z, v | v,) is discussed in Appendix 
B. 

The mean time between unstable periods, given the initial starting 
state, can be approximated by again assuming voice traffic service 
times are very long relative to data traffic service times. For each voice 
state, we assume that the data traffic exhibits steady-state behavior. 
This leads to the following difference equation, 


Ts =t+ holt + Sy Ty41, (36) 


where 
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1 


th = — 
voy + Upy 


(37a) 
is the mean time spent in voice state uv before going to state v — 1 or 
v + 1, ¢, is the “entrance” probability for an incoming voice message 
and is equal to the probability that a data message queue exists, and 


Diy 
pace UU 37b 
‘ vv + Vey ( ) 
and 
voy 
» = —— 37 
: vv + Upy ( i 


are, respectively, the probabilities of going from voice state v to state 
v — 1 and from voice state v to state v + 1. In Appendix C we show 


eS 1 Up—1 Me a j-l My m-j yinmt 
es as 1(2) - + («:) Gs — |. 8) 


v= II om. (39) 


m=0 


where 


Therefore, computation of T, and p*(v) by way of (38) and the 
method given in Appendix B, respectively, yields the mean time 
between unstable periods. The relative frequency, or probability, that 
the system is in an unstable state (v = uo) is approximated by 

Tw 
pe 40 
i Tw tT x40) 





where 7,, and T are given by (25) and (32), respectively. Notice that 
this expression should be approximately equal to the value of p(v9) 
obtained using (5). 

This completes the presentation of analytical results. These results 
are used in the next section to evaluate the performance of an inte- 
grated voice-data TDMA network, and to demonstrate the improve- 
ment over analogous TDMA networks that handle only one traffic 


type. 


IV. PERFORMANCE RESULTS 


The objective of this section is to demonstrate how the integrated 
voice-data network described in Sections I and II performs as a 
function of (1) input traffic intensity, (2) traffic blend (ratio of voice 
traffic intensity to total traffic intensity), and (3) system size, as 
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measured by the number of time slots. In all cases, half-duplex voice 
traffic is assumed. We expect the full-duplex case, where each voice 
message requests two time slots, to exhibit similar behavior. Denoting 
the voice traffic intensity as p, = \,/(cu,), and the data traffic intensity 
as pa = Xa/(cug), where c is the number of time slots, the total 
normalized offered load is defined as 





Pp = pu + pa, (41) 
and the traffic blend is 
Pv 
r= : 42 
Pv + pa oe 


Initial traffic parameters are selected to produce a preselected value 
of r, and the input traffic intensity p is varied between zero and one 
by multiplying both \, and dg by a constant. The service rates used 
are pu, = 0.001 and pg = 0.025, which corresponds to a mean voice 
message length of 1000 time slots and a mean data message length of 
40 time slots. In practice, the mean voice message is much greater 
than 1000 time slots; however, the analytical results in the last section 
become more accurate as p, decreases relative to pg. 

Figures 4a, 4b, and 4c show voice message blocking probabilities 
computed by means of (4) versus the offered load for systems with 20, 
100, and 500 time slots, respectively. In each case, curves are shown 
for three different traffic blends. A few randomly selected points from 
Figs. 4a, 4b, and 4c were compared with results obtained by a computer 
simulation of the queueing model under consideration. In each case 
the approximate result and the computer-simulated result were nearly 
identical. Also shown are plots of blocking probabilities versus load 
produced by systems that handle only (half-duplex) voice traffic and 
which have rc time slots, where c is the number of time slots in the 
voice-data network. These curves are computed directly from (5), 
where ¢; = 1 forj < rc. 

At high traffic intensities, blocking probabilities for the integrated 
systems are consistently higher than those for the analogous single 
traffic systems. In this region, data queues often form, causing data 
messages to grab time slots relinquished by voice messages. In contrast, 
at lower traffic intensities, data message queues are less likely, so that 
voice messages often have access to additional time slots. At low traffic 
intensities, integrated systems therefore always exhibit superior per- 
formance when compared with analogous voice-only systems. In each 
comparison, there appears to be a unique traffic intensity, p*, where 
both systems give the same blocking probability. Notice that as the 
traffic blend r decreases, p* increases. This is due to the fact that the 
probability of a data message queue existing at a fixed, normalized 
traffic intensity decreases as the number of time slots allocated for 
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Fig. 4—Voice message blocking probability versus offered load for integrated systems 
with: (a) 20 slots and single traffic systems with 2, 10, and 18 slots; (b) 100 slots and 
single traffic systems with 10, 50, and 90 slots. 


data traffic increases. As r decreases, voice messages therefore often 
have access to additional time slots not used by data traffic. At a fixed 
traffic intensity, as r decreases, the blocking probability produced by 
the integrated system should therefore decrease, relative to the block- 
ing probability produced by the analogous voice-only system. A final 
observation is that as the traffic intensity decreases, blocking proba- 
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g.4(c)—Voice mess age. blocking probability versus offered load for integrated 
Ree with 500 slots and single traffic systems with 50, 250, and 450 slots. 


bilities obtained using the integrated systems become insensitive to 
the traffic blend. This is in contrast to the analogous voice-only 
systems, which result in a much wider variation in blocking probabil- 
ities as the number of time slots is varied. 

Figures 5a, 5b, and 5c show plots of mean data message queue length 
(number of queued data requests), computed by means of (7), (8), and 
(10), versus normalized offered load for systems with 20, 100, and 500 
time slots, respectively. Curves are again shown for three different 
traffic blends. Also plotted is the mean number of queued data requests 
produced by a system handling data traffic only with c time slots. 
Close agreement was again found between randomly selected points 
from these figures and computer simulation results. At high traffic 
intensities, the variation between curves is caused by the different 
traffic loads, at which the mean data queue length approaches infinity. 
In particular, the single traffic curve has its asymptote at p = pg = 1. 
In contrast, because queued data messages can grab relinquished voice- 
occupied time slots, the integrated traffic curves have asymptotes at 
pa = 1, which corresponds to p = 1.1, p = 2, and p = 10 for r = 0.1, 
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Fig. 5—Mean data message queue length versus offered load for integrated systems 
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Fig. 5(c)—Mean data message queue length versus offered load for integrated systems 
and a single traffic system with 500 slots. 


r = 0.5, and r = 0.9, respectively. Mean queue length in these high 
traffic intensity regions is not of interest, however, since the corre- 
sponding voice blocking probability is near unity. 

Figures 6a, 6b, and 6c show plots of E(dy) given by (23) versus 
offered load, which indicates the mean value of the maximum of queue 
buildups. The discontinuities in Fig. 6a are caused by discontinuous 
changes in the voice instability state up as a function of traffic load. 
As an example, for the case c = 20 and r = 0.5 shown in Fig. 6a, Uo 
changes from 14 to 13 as the traffic intensity p increases from 0.7 — « 
to 0.7, where « is small. Discontinuities were observed in all curves 
shown in Fig. 6; however, in most cases these discontinuities were 
hardly noticeable. In particular, as the number of slots c increases, 
E(dm) becomes less sensitive to changes in Up. A comparison of the 
results in Figs. 6a, 6b, and 6c with simulated sample paths of the data- 
message queue process indicates that the results presented here are 
typically about 10 to 25 percent smaller than the actual peaks observed, 
indicating that the peaks that occur when v < up are often greater 
than those that occur when v = Ug. As p, decreases relative to ua, 
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Fig. 6—Approximate mean value of queue peaks versus offered load from eq. (23) for 
integrated systems with: (a) 20 slots and (b) 100 slots. 
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Fig. 6(c)—Approximate mean value of queue peaks versus offered load from eq. (23) 
for integrated systems with 500 slots. 


however, E(dy) given by (23) should give a more accurate indication 
of the mean value of these peaks. 

For each point computed in Figs. 4 through 6, the probability of 
being in an “unstable” state p(vo), the mean duration of the unstable 
state (7,), and the mean time between unstable states (TJ) were 
calculated by means of the results in Section 3.1. A few representative 
points are listed in Table I. The probability of being in an unstable 
state and the resulting queues that form are typically too small to 
cause significant degradation in system performance. 


V. CONCLUSIONS 


Results obtained from the analysis of an integrated voice-data 
TDMA protocol indicate that voice message blocking probability in 
the integrated system is insensitive to the blend of traffic at low input 
traffic intensities. As the traffic intensity increases, the blocking 
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Table |!—Mean duration of unstable state (7,, in number of frames), 
mean time between unstable states (T in number of frames), and the 
probability of being in an unstable state [ p(vo)] for a few 
representative cases (assuming r = 0.5) 


Time Slots Load (p) Tw T p(vo) 
20 0.6 129.2 2.05-10° 6.3-107* 
20 1.0 191.8 2898 0.062 
100 0.7 44.8 1.85 -10° 2.4-1078 
100 1.0 60.8 4.15-10* 0.0015 
500 0.85 6.66 - 10° 4,12-10"° 1.6-10-® 
500 1.0 2.88.10" 5.87-1073 4.9.1077 


probability of the integrated system increases, relative to the blocking 
probability of the analogous voice-only system. The traffic intensity 
at which the two blocking probability curves intersect is a function of 
the traffic blend. Mean queue length in the integrated system displays 
a wide variation with traffic blend at high traffic intensities, due to 
the variation in traffic intensity at which instability occurs. At offered 
loads of 0.7 to 0.8, very good performance can be achieved (i.e., a 
blocking probability <0.01 and a mean queue length near zero) with 
moderately sized systems (~100 slots/frame). Finally, the data mes- 
sage queues that form during the unstable transients are moderate for 
the cases examined, and the frequency at which these transients occur 
is in most cases quite small. As the number of time slots per TDMA 
frame increases, results presented here show significant improvements 
in system performance. This is an important observation since most 
practical networks are much larger than those considered here (i.e., 
greater than 1000 time slots per frame). 
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APPENDIX A 


We wish to show (10) using the approximation (9). In addition, we 
assume that E[q(t)] is piecewise linear. (This would be true, for 
instance, if the data message queue length q(t) were allowed to assume 
negative values.) At time t = 0 we assume that the state of the system 
is (Uo, C — Uo). Let t,(i) denote the mean time it takes i + 1 voice 
messages to relinquish their time slots. Then 


ee ae 1 
ili) =) 3 43 
©) 2 (Vo — Joo ) 
and 
E[q(t)] = 
QmVopyt 0<t<z,(0) (44) 
Efq{t.(i)]} — [(e— vo tit pa dal[t—t(1)], te(i)<t<t,i+1))’ 
where qy is the peak value of E[q(t)] and is given by 
Palme tants reat (45) 
Vokv 
Notice that for t > ¢,(0), 
Etq{to(t))} = Qs = G-1 
— [(¢ — Vo + t)ma — Aal[t.(t) — t.(¢ - 1)] 
: : 
= = ae +J — \g| -————.. 46 
QM » [(c — vo + J) Ma d] (us = Tie (46) 
To calculate the area under E[q(t)], we must first compute 
to = inf{t|E[g()] =0 and t>0}. (47) 


Letting 
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io = max{i|g; > 0}, (48) 
it follows that 
ty(io) < to < ty (io + 1). (49) 
For t, (io) St S to, 
Elg(t)] = gi, — [(¢ — Vo + to + 1)ua — Aal[é — ty(to)], (50) 


and setting q;, = 0 gives 


Tip ; 
to = - mm —_— tt 51 
. (c — Vo + lo + lua — ra (10) (51) 


A plot of E[g(t)], 0 s t S ty), assuming three voice departures (ip = 
2) is shown in Fig. 7. If we use (9), it follows that 





1 igt1 
E(q|v0) =— X Aj, (52) 
; to j=0 
where A; is the area of region R;. It is apparent that 
1 qu _ 1 da (¢ — Vo)Ma 
Ao == = - —_, 53 
© 2 t(0) 2 (Vote)? me 
and that 
1 nace 
Ain = 3 [to — ty(to) Gi, 
~92 
Jip 
a a ATE 54 
2[(c — Vo + lo + 1)ua — dal ee) 
Finally, regions R,, ---, R;, are trapezoids with upper-boundary 


E[q(t)], so that for 1 <j S i, 


Elq(t}] 





0 t,(0) t/(1) t(2) 


TIME —— 


Fig. 7—Mean queue length versus time during transient instability. 
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ee ; [toi) — toi — DIGa + &) 


= re NO 55 

200 — Fwy A * - 
Substituting (51) and (53) through (55) into (52), and using (438) and 
(46) gives (10). 


APPENDIX B 


We are interested in computing D{1/[re(vou,)], vo — 1} and D{1/ 
[ro(Vopy)], v| vi}, where D(z, v) and D(z, v|v;) are given, respectively, 
by (81) and (35) and re(vouy) is given by (18b). Multiplying both sides 
of (27a) by 2? and summing from d = 1 to infinity gives, after 
algebraic manipulation, 


(z — 1)[vu,D(z, v — 1) — Aa71y] + 1 


es ae 2(1 — z)s(v, 27+) ; 


(56) 


where 
s(v, 2) = pale — v)z? — [(c — v)wa + Ag + UW ]zZ + Aa, (57) 


which has real roots 


enn {o(v) + Vo7(v) — 4(c — v)padaf, (58) 
where k = 1 corresponds to “+”, k = 2 corresponds to , and o(v) is 
given by (28). Notice that the roots r;(Uo) given by (58) are identical 
to the roots r;(Vouy) given by (18). For convenience, we therefore refer 
to the roots r;,(Vo“y) as Te(Uo), for k = 1, 2. From Rouche’s Theorem,” 
it follows that re(v) <1 andr,(v) = 1. The z-transform, D(z, v), must 
be analytic outside the unit circle, and hence 7,, in (56) must be 
selected to cancel the pole at 1/[r2(v)]. In particular, s(v, z~') has roots 
1/[ri(v)] < 1 and 1/[re(v)] > 1, so that 


- _ Uy 1 = r2(v) 
Tw = o., y | (= @: oy) 


r,(v) = 


“4 





As an example, suppose that we assume the data message queue 
empties with probability one before k + 1 voice messages relinquish 
their time slots. Once the voice state becomes v = Up — k — 1, the voice 
service rate nu, = 0. Substituting these values for u, and v in (56) gives 


D(z, Vo — k — 1) 


a[1 — Aglz — WF1y-n-1] 


~@—-Dii—-r(v —k—-Dell —r(v —k—- Dal’ (60) 
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where from (58), 
ri(vo ~-k-1)=1 and re(vo — k — 1) 
da 


= (c—vo tk + 1)ua (61) 


Selecting 71,,,-2-1 to cancel the pole at 1/[r2(Uo — k —1)] and simplifying 
gives 


2 1 


NEE Ea ae eee aa 
which has inverse transform 
d 
Rue = ae pe (63) 


Ha{e — Up + R+1)— dg 


Equation (62) constitutes an initial condition for (56), which can be 
iterated numerically using (59). In particular, assuming no more than 
k voice messages can relinquish their time slots after the queue begins 
to empty, D(z, vo — k — 1) for z = 1/[re(vo)] and z = 1/[re(vo — kR)] is 
calculated from (62). The value of 71,,-» is subsequently computed 
from (59) and is used in (56) to compute D(z, vo — k) for z = 1/[re(vo)] 
and z = 1/[re(vo — k + 1)]. Equation (59) is subsequently used to 
compute 71,,,-r+1, Which is used to compute D(z, up — k + 1), and so 
forth until D{1/[r2(vo)], vo — 1} is computed. 

To compute D(z, v| vi) given by (35) at z = 1/[re(vo)], we multiply 
both sides of (33a) by z~¢ and sum from d = 1 to infinity to get 


—vpy, D(z, v| v1) + Agp(v| vi + 1, 1) 
+ Upydv,v, — [o(v) — AagZ]bv,v,41 


D(z, viv, + 1) = 2s(0, 27) 


, (64) 
where 6;; is the Kronecker delta. Using the condition (33b) gives the 
boundary condition 


D(z, vi |v, — 1) = 0. (65) 


Because D(z, v|v;) must be analytic outside the unit circle, p(v | v;, 1) 
is selected to cancel the pole at z = 1/[r2(vo)]. This implies that 


Vip 1 
,bp= D oli = 1) be¢,= 
p(v| v1, 1) | - | vy | 


a(U)r2e(v) — da 
Aa2(v) 
To compute D(z, v| vo — 1) at z = 1/[re(vo)] for v = vo — 1, Vo — 2, 


++, Uo — Rk — 1, where k is the maximum number of voice departures 
allowed, the boundary condition (65) is first used in (66) to get 








dvv,- (66) 
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a(Uo — J)r2(Vo — J) — Aa 


Nal2(Vo — J) : ey) 


P(Vo — j|vo =, 1) ma 
which is used in (64) to compute 


= AaP(Vo — J| Vo — J, 1) — [o(vo — J) — raz] 


D(z, vo — J| Vo — J) ioe) 


where j is initially one and ranges from one to k + 1. This expression 
is evaluated at z = 1/[re(vo — j)] and substituted into (66) to obtain 
p(Vo ~ J | Vo —j + 1, 1), which is used in (64) to compute D(z, vp — j| vo 
—j+1) at the appropriate values of z. This procedure continues until 
the value of D{1/[re(vo)], vo — j | Vo — 1} is obtained, whereupon j is 
incremented and the procedure starts over again. In this way the 
values D{1/[re(vo)], vo — J | Vo — 1} for j = 1, 2, --- ,k +1 are generated 
systematically. 


APPENDIX C 


We wish to show that (38) is the solution to (36). We first rewrite 
(36) as 











mo Uy \ Ube | 1 
A ( . zee a naan °) 
Letting 
Y= ay = T cts | (69) 


(68) can be rewritten as 


for 0Svu< Wp, (70) 


_ Uby 1 
idm Fda a 


which can be iterated to give 


k+1 m 
, v!Yv-k- eee ge v!Yv—m 
Yor = (£: York Yo-k ~ 2 2 (e) ut ’ (71) 


Xv (v ell (es 1)!y, Av m=0 \Av (v — m)\yy 
where y, is given by (39). Using the initial condition, 
= sa 1 
=T,-T)=-——, 12 
Bal 1 0 eR (72) 


and substituting k = v — 1 in (71) gives 


1 Ly i‘ v! es Bo aa Yv—m-1 
= = — TET A IC 73 
ae dv (: Yv i 2 (é (v — m)! ) 
From (69), 
T, = > Jj + To, (74) 
jl 
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which has boundary condition 


Tx _ Jj + lo= 0 
j=l 
This implies that 
vo 
To ccs Y Jj» 
j=l 
so that 
Uo 
ae dy i 
j=u+1 


Combining (73) and (77) gives (38). 
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